Working with Economic data in Python

This notebook will introduce you to working with data in Python. You will use packages like Numpy to manipulate, work and do computations with arrays, matrices, and such, and anipulate data (see my Introduction to Python). But given the needs of economists (and other scientists) it will be advantageous for us to use pandas. pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for Python. pandas allows you to import and process data in many useful ways. It interacts greatly with other packages that complement it making it a very powerful tool for data analysis.

With pandas you can

  1. Import many types of data, including
    • CSV files
    • Tab or other types of delimited files
    • Excel (xls, xlsx) files
    • Stata files
  1. Open files directly from a website
  2. Merge, select, join data
  3. Perform statistical analyses
  4. Create plots of your data and much more. Let's start by importing pandas and use to it download some data and create some of the figures from the lecture notes. Note that when importing pandas it is accustomed to assign it the alias pd. I suggest you follow this conventiuon, which will make using other peoples code and snippets easier.
In [1]:
# Let's import pandas and some other basic packages we will use 
from __future__ import division
%pylab --no-import-all
%matplotlib inline
import pandas as pd
import numpy as np
Using matplotlib backend: MacOSX
Populating the interactive namespace from numpy and matplotlib

Working with Pandas

The basic structures in pandas are pd.Series and pd.DataFrame. You can think of a pd.Series as a labeled vector that contains data and has a large set of functions that can be easily performed on it. A pd.DataFrame is similar a table/matrix of multidimensional data where each column contains a pd.Series. I know...this may not explain much, so let's start with some actual examples. Let's create two series, one containing some country names and another containing some ficticious data.

In [2]:
countries = pd.Series(['Colombia', 'Turkey', 'USA', 'Germany', 'Chile'], name='country')
print(countries)
print('\n', 'There are ', countries.shape[0], 'countries in this series.')
0    Colombia
1      Turkey
2         USA
3     Germany
4       Chile
Name: country, dtype: object

 There are  5 countries in this series.

Notice that we have assinged a name to the series that is different than the name of the variable containing the series. Our print(countries) statement is showing the series and its contents, its name and the dype of data it contains. Here our series is only composed of strings so it assigns it the object dtype (not important for now, but we will use this later to convert data between types, e.g. strings to integers or floats or the other way around).

Let's create the data using some of the functions we already learned.

In [3]:
np.random.seed(123456)
data = pd.Series(np.random.normal(size=(countries.shape)), name='noise')
print(data)
print('\n', 'The average in this sample is ', data.mean())
0    0.469112
1   -0.282863
2   -1.509059
3   -1.135632
4    1.212112
Name: noise, dtype: float64

 The average in this sample is  -0.24926597871826645

Here we have used the mean() function of the series to compute its mean. There are many other properties/functions for these series including std(), shape, count(), max(), min(), etc. You can access these by writing series.name_of_function_or_property. To see what functions are available you can hit tab after writing series..

Let's create a pd.DataFrame using these two series.

In [4]:
df = pd.DataFrame([countries, data])
df
Out[4]:
0 1 2 3 4
country Colombia Turkey USA Germany Chile
noise 0.469112 -0.282863 -1.50906 -1.13563 1.21211

Not exactly what we'd like, but don't worry, we can just transpose it so it has each country with its data in a row.

In [5]:
df = df.T
df
Out[5]:
country noise
0 Colombia 0.469112
1 Turkey -0.282863
2 USA -1.50906
3 Germany -1.13563
4 Chile 1.21211

Now let us add some more data to this dataframe. This is done easily by defining a new columns. Let's create the square of noise, create the sum of noise and its square, and get the length of the country's name.

In [6]:
df['noise_sq'] = df.noise**2
df['noise and its square'] = df.noise + df.noise_sq
df['name length'] = df.country.apply(len)
df
Out[6]:
country noise noise_sq noise and its square name length
0 Colombia 0.469112 0.220066 0.689179 8
1 Turkey -0.282863 0.0800117 -0.202852 6
2 USA -1.50906 2.27726 0.768199 3
3 Germany -1.13563 1.28966 0.154029 7
4 Chile 1.21211 1.46922 2.68133 5

This shows some of the ways in which you can create new data. Especially useful is the apply mathod, which applies a function to the series. You can also apply a function to the whole dataframe, which is useful if you want to perfomr computations using various columns.

Now, let's plot the various series in the dataframe

In [7]:
df.plot()
Out[7]:

Not too nice nor useful. Notice that it assigned the row number to the x-axis labels. Let's change the row labels, which are contained in the dataframe's index by assigning the country names as the index.

In [8]:
df = df.set_index('country')
df.plot()
Out[8]:

Better, but still not very informative. Below we will improve on this when we work with some real data.

Getting data

One of the nice features of pandas and its ecology is that it makes obtaining data very easy. In order to exemplify this and also to revisit some of the basic facts of comparative development, let's download some data from various sources. This may require you to create accounts in order to access and download the data (sometimes the process is very simple and does not require an actual project...in other cases you need to propose a project and be approved...usually due to privacy concerns with micro-data). Don't be afraid, all these sources are free and are used a lot in research, so it is good that you learn to use them. Let's start with a list of useful sources.

Country-level data economic data

Censuses, Surveys, and other micro-level data

  • IPUMS: provides census and survey data from around the world integrated across time and space.
  • General Social Survey provides survey data on what Americans think and feel about such issues as national spending priorities, crime and punishment, intergroup relations, and confidence in institutions.
  • European Social Survey provides survey measures on the attitudes, beliefs and behaviour patterns of diverse European populations in more than thirty nations.
  • UK Data Service is the UK’s largest collection of social, economic and population data resources.
  • SHRUG is The Socioeconomic High-resolution Rural-Urban Geographic Platform for India. Provides access to dozens of datasets covering India’s 500,000 villages and 8000 towns using a set of a common geographic identifiers that span 25 years.

Divergence - Big time

To study the divergence across countries let's download and plot the historical GDP and population data. In order to keep the data and not having to download it everytime from scratch, we'll create a folder ./data in the currect directory and save each file there. Also, we'll make sure that if the data does not exist, we download it. We'll use the os package to create directories.

Setting up paths

In [9]:
import os

pathout = './data/'

if not os.path.exists(pathout):
    os.mkdir(pathout)
    
pathgraphs = './graphs/'
if not os.path.exists(pathgraphs):
    os.mkdir(pathgraphs)

Download New Maddison Project Data

In [12]:
try:
    maddison_new = pd.read_stata(pathout + 'Maddison2018.dta')
    maddison_new_region = pd.read_stata(pathout + 'Maddison2018_region.dta')
    maddison_new_1990 = pd.read_stata(pathout + 'Maddison2018_1990.dta')
except:
    maddison_new = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018.dta')
    maddison_new.to_stata(pathout + 'Maddison2018.dta', write_index=False)
    maddison_new_region = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018_region_data.dta')
    maddison_new_region.to_stata(pathout + 'Maddison2018_region.dta', write_index=False)
    maddison_new_1990 = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018_1990bm.dta')
    maddison_new_1990.to_stata(pathout + 'Maddison2018_1990.dta', write_index=False)
In [13]:
maddison_new
Out[13]:
countrycode country year cgdppc rgdpnapc pop i_cig i_bm
0 AFG Afghanistan 1820.0 NaN NaN 3280.0 NaN NaN
1 AFG Afghanistan 1870.0 NaN NaN 4207.0 NaN NaN
2 AFG Afghanistan 1913.0 NaN NaN 5730.0 NaN NaN
3 AFG Afghanistan 1950.0 2392.0 2392.0 8150.0 Extrapolated NaN
4 AFG Afghanistan 1951.0 2422.0 2422.0 8284.0 Extrapolated NaN
... ... ... ... ... ... ... ... ...
19868 ZWE Zimbabwe 2012.0 1623.0 1604.0 12620.0 Extrapolated NaN
19869 ZWE Zimbabwe 2013.0 1801.0 1604.0 13183.0 Extrapolated NaN
19870 ZWE Zimbabwe 2014.0 1797.0 1594.0 13772.0 Extrapolated NaN
19871 ZWE Zimbabwe 2015.0 1759.0 1560.0 14230.0 Extrapolated NaN
19872 ZWE Zimbabwe 2016.0 1729.0 1534.0 14547.0 Extrapolated NaN

19873 rows × 8 columns

This dataset is in long format. Also, notice that the year is not an integer. Let's correct this

In [14]:
maddison_new['year'] = maddison_new.year.astype(int)
maddison_new
Out[14]:
countrycode country year cgdppc rgdpnapc pop i_cig i_bm
0 AFG Afghanistan 1820 NaN NaN 3280.0 NaN NaN
1 AFG Afghanistan 1870 NaN NaN 4207.0 NaN NaN
2 AFG Afghanistan 1913 NaN NaN 5730.0 NaN NaN
3 AFG Afghanistan 1950 2392.0 2392.0 8150.0 Extrapolated NaN
4 AFG Afghanistan 1951 2422.0 2422.0 8284.0 Extrapolated NaN
... ... ... ... ... ... ... ... ...
19868 ZWE Zimbabwe 2012 1623.0 1604.0 12620.0 Extrapolated NaN
19869 ZWE Zimbabwe 2013 1801.0 1604.0 13183.0 Extrapolated NaN
19870 ZWE Zimbabwe 2014 1797.0 1594.0 13772.0 Extrapolated NaN
19871 ZWE Zimbabwe 2015 1759.0 1560.0 14230.0 Extrapolated NaN
19872 ZWE Zimbabwe 2016 1729.0 1534.0 14547.0 Extrapolated NaN

19873 rows × 8 columns

Original Maddison Data

Now, let's download, save and read the original Maddison database. Since the original file is an excel file with different data on each sheet, it will require us to use a different method to get all the data.

In [15]:
if not os.path.exists(pathout + 'Maddison_original.xls'):
    import urllib
    dataurl = "http://www.ggdc.net/maddison/Historical_Statistics/horizontal-file_02-2010.xls"
    urllib.request.urlretrieve(dataurl, pathout + 'Maddison_original.xls')

Some data munging

This dataset is not very nicely structured for importing, as you can see if you open it in Excel. I suggest you do so, so that you can better see what is going on. Notice that the first two rows really have no data. Also, every second column is empty. Moreover, there are a few empty rows. Let's import the data and clean it so we can plot and analyse it better.

In [16]:
maddison_old_pop = pd.read_excel(pathout + 'Maddison_original.xls', sheet_name="Population", skiprows=2)
maddison_old_pop
Out[16]:
Unnamed: 0 1 Unnamed: 2 1000 Unnamed: 4 1500 Unnamed: 6 1600 Unnamed: 8 1700 ... 2002 2003 2004 2005 2006 2007 2008 2009 Unnamed: 201 2030
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 500.0 NaN 700.0 NaN 2000.0 NaN 2500.0 NaN 2500.0 ... 8148.312 8162.656 8174.762 8184.691 8192.880 8199.783 8205.533 8210 NaN 8120.000
2 Belgium 300.0 NaN 400.0 NaN 1400.0 NaN 1600.0 NaN 2000.0 ... 10311.970 10330.824 10348.276 10364.388 10379.067 10392.226 10403.951 10414 NaN 10409.000
3 Denmark 180.0 NaN 360.0 NaN 600.0 NaN 650.0 NaN 700.0 ... 5374.693 5394.138 5413.392 5432.335 5450.661 5468.120 5484.723 5501 NaN 5730.488
4 Finland 20.0 NaN 40.0 NaN 300.0 NaN 400.0 NaN 400.0 ... 5193.039 5204.405 5214.512 5223.442 5231.372 5238.460 5244.749 5250 NaN 5201.445
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
273 Guadeloupe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 435.739 440.189 444.515 448.713 452.776 456.698 460.486 n.a. NaN 523.493
274 Guyana (Fr.) NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 182.333 186.917 191.309 195.506 199.509 203.321 206.941 n.a. NaN 272.781
275 Martinique NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 422.277 425.966 429.510 432.900 436.131 439.202 442.119 n.a. NaN 486.714
276 Reunion NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 743.981 755.171 766.153 776.948 787.584 798.094 808.506 n.a. NaN 1025.217
277 Total NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1784.330 1808.243 1831.487 1854.067 1876.000 1897.315 1918.052 n.a. NaN 2308.205

278 rows × 203 columns

In [17]:
maddison_old_gdppc = pd.read_excel(pathout + 'Maddison_original.xls', sheet_name="PerCapita GDP", skiprows=2)
maddison_old_gdppc
Out[17]:
Unnamed: 0 1 Unnamed: 2 1000 Unnamed: 4 1500 Unnamed: 6 1600 Unnamed: 8 1700 ... 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 425.000000 NaN 425.000000 NaN 707 NaN 837.200000 NaN 993.200000 ... 20065.093878 20691.415561 20812.893753 20955.874051 21165.047259 21626.929322 22140.725899 22892.682427 23674.041130 24130.547035
2 Belgium 450.000000 NaN 425.000000 NaN 875 NaN 975.625000 NaN 1144.000000 ... 19964.428266 20656.458570 20761.238278 21032.935511 21205.859281 21801.602508 22246.561977 22881.632810 23446.949672 23654.763464
3 Denmark 400.000000 NaN 400.000000 NaN 738.333 NaN 875.384615 NaN 1038.571429 ... 22254.890572 22975.162513 23059.374968 23082.620719 23088.582457 23492.664119 23972.564284 24680.492880 24995.245167 24620.568805
4 Finland 400.000000 NaN 400.000000 NaN 453.333 NaN 537.500000 NaN 637.500000 ... 18855.985066 19770.363126 20245.896529 20521.702225 20845.802738 21574.406196 22140.573208 23190.283543 24131.519569 24343.586318
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
190 Total Africa 472.352941 NaN 424.767802 NaN 413.71 NaN 422.071584 NaN 420.628684 ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474
191 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
192 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
193 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
194 World Average 466.752281 NaN 453.402162 NaN 566.389 NaN 595.783856 NaN 614.853602 ... 5833.255492 6037.675887 6131.705471 6261.734267 6469.119575 6738.281333 6960.031035 7238.383483 7467.648232 7613.922924

195 rows × 200 columns

Let's start by renaming the first column, which has the region/country names

In [18]:
maddison_old_pop.rename(columns={'Unnamed: 0':'Country'}, inplace=True)
maddison_old_gdppc.rename(columns={'Unnamed: 0':'Country'}, inplace=True)

Now let's drop all the columns that do not have data

In [19]:
maddison_old_pop = maddison_old_pop[[col for col in maddison_old_pop.columns if str(col).startswith('Unnamed')==False]]
maddison_old_gdppc = maddison_old_gdppc[[col for col in maddison_old_gdppc.columns if str(col).startswith('Unnamed')==False]]

Now, let's change the name of the columns so they reflect the underlying variable

In [20]:
maddison_old_pop.columns = ['Country'] + ['pop_'+str(col) for col in maddison_old_pop.columns[1:]]
maddison_old_gdppc.columns = ['Country'] + ['gdppc_'+str(col) for col in maddison_old_gdppc.columns[1:]]
In [21]:
maddison_old_pop
Out[21]:
Country pop_1 pop_1000 pop_1500 pop_1600 pop_1700 pop_1820 pop_1821 pop_1822 pop_1823 ... pop_2001 pop_2002 pop_2003 pop_2004 pop_2005 pop_2006 pop_2007 pop_2008 pop_2009 pop_2030
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 500.0 700.0 2000.0 2500.0 2500.0 3369.0 3386.0 3402.0 3419.0 ... 8131.690 8148.312 8162.656 8174.762 8184.691 8192.880 8199.783 8205.533 8210 8120.000
2 Belgium 300.0 400.0 1400.0 1600.0 2000.0 3434.0 3464.0 3495.0 3526.0 ... 10291.679 10311.970 10330.824 10348.276 10364.388 10379.067 10392.226 10403.951 10414 10409.000
3 Denmark 180.0 360.0 600.0 650.0 700.0 1155.0 1167.0 1179.0 1196.0 ... 5355.826 5374.693 5394.138 5413.392 5432.335 5450.661 5468.120 5484.723 5501 5730.488
4 Finland 20.0 40.0 300.0 400.0 400.0 1169.0 1186.0 1202.0 1219.0 ... 5180.309 5193.039 5204.405 5214.512 5223.442 5231.372 5238.460 5244.749 5250 5201.445
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
273 Guadeloupe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 431.170 435.739 440.189 444.515 448.713 452.776 456.698 460.486 n.a. 523.493
274 Guyana (Fr.) NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 177.562 182.333 186.917 191.309 195.506 199.509 203.321 206.941 n.a. 272.781
275 Martinique NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 418.454 422.277 425.966 429.510 432.900 436.131 439.202 442.119 n.a. 486.714
276 Reunion NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 732.570 743.981 755.171 766.153 776.948 787.584 798.094 808.506 n.a. 1025.217
277 Total NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1759.756 1784.330 1808.243 1831.487 1854.067 1876.000 1897.315 1918.052 n.a. 2308.205

278 rows × 197 columns

In [22]:
maddison_old_gdppc
Out[22]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1821 gdppc_1822 gdppc_1823 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 425.000000 425.000000 707 837.200000 993.200000 1218.165628 NaN NaN NaN ... 20065.093878 20691.415561 20812.893753 20955.874051 21165.047259 21626.929322 22140.725899 22892.682427 23674.041130 24130.547035
2 Belgium 450.000000 425.000000 875 975.625000 1144.000000 1318.870122 NaN NaN NaN ... 19964.428266 20656.458570 20761.238278 21032.935511 21205.859281 21801.602508 22246.561977 22881.632810 23446.949672 23654.763464
3 Denmark 400.000000 400.000000 738.333 875.384615 1038.571429 1273.593074 1320.479863 1326.547922 1307.692308 ... 22254.890572 22975.162513 23059.374968 23082.620719 23088.582457 23492.664119 23972.564284 24680.492880 24995.245167 24620.568805
4 Finland 400.000000 400.000000 453.333 537.500000 637.500000 781.009410 NaN NaN NaN ... 18855.985066 19770.363126 20245.896529 20521.702225 20845.802738 21574.406196 22140.573208 23190.283543 24131.519569 24343.586318
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
190 Total Africa 472.352941 424.767802 413.71 422.071584 420.628684 419.755914 NaN NaN NaN ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474
191 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
192 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
193 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
194 World Average 466.752281 453.402162 566.389 595.783856 614.853602 665.735330 NaN NaN NaN ... 5833.255492 6037.675887 6131.705471 6261.734267 6469.119575 6738.281333 6960.031035 7238.383483 7467.648232 7613.922924

195 rows × 195 columns

Let's choose the rows that hold the aggregates by region for the main regions of the world.

In [23]:
gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.apply(lambda x: str(x).upper().find('TOTAL')!=-1)].reset_index(drop=True)
gdppc = gdppc.dropna(subset=['gdppc_1'])
gdppc = gdppc.loc[2:]
gdppc['Country'] = gdppc.Country.str.replace('Total', '').str.replace('Countries', '').str.replace('\d+', '').str.replace('European', 'Europe').str.strip()
gdppc = gdppc.loc[gdppc.Country.apply(lambda x: x.find('USSR')==-1 and  x.find('West Asian')==-1)].reset_index(drop=True)
gdppc
Out[23]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1821 gdppc_1822 gdppc_1823 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe 576.167665 427.425665 771.094 887.906964 993.456911 1194.184683 NaN NaN NaN ... 18497.208533 19176.001655 19463.863297 19627.707522 19801.145425 20199.220700 20522.238008 21087.304789 21589.011346 21671.774225
1 Western Offshoots 400.000000 400.000000 400 400.000000 476.000000 1201.993477 NaN NaN NaN ... 26680.580823 27393.808035 27387.312035 27648.644070 28090.274362 28807.845958 29415.399334 29922.741918 30344.425293 30151.805880
2 East Europe 411.789474 400.000000 496 548.023599 606.010638 683.160984 NaN NaN NaN ... 5734.162109 5970.165085 6143.112873 6321.395376 6573.365882 6942.136596 7261.721015 7730.097570 8192.881904 8568.967581
3 Latin America 400.000000 400.000000 416.457 437.558140 526.639004 691.060678 NaN NaN NaN ... 5765.585093 5889.237351 5846.295193 5746.609672 5785.841237 6063.068969 6265.525702 6530.533583 6783.869986 6973.134656
4 Asia 455.671021 469.961665 568.418 573.550859 571.605276 580.626115 NaN NaN NaN ... 3623.902724 3797.608955 3927.186275 4121.275511 4388.982705 4661.517477 4900.563281 5187.253152 5408.383588 5611.198564
5 Africa 472.352941 424.767802 413.71 422.071584 420.628684 419.755914 NaN NaN NaN ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474

6 rows × 195 columns

Let's drop missing values

In [24]:
gdppc = gdppc.dropna(axis=1, how='any')
gdppc
Out[24]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1870 gdppc_1900 gdppc_1913 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe 576.167665 427.425665 771.094 887.906964 993.456911 1194.184683 1953.068150 2884.661525 3456.576178 ... 18497.208533 19176.001655 19463.863297 19627.707522 19801.145425 20199.220700 20522.238008 21087.304789 21589.011346 21671.774225
1 Western Offshoots 400.000000 400.000000 400 400.000000 476.000000 1201.993477 2419.152411 4014.870040 5232.816582 ... 26680.580823 27393.808035 27387.312035 27648.644070 28090.274362 28807.845958 29415.399334 29922.741918 30344.425293 30151.805880
2 East Europe 411.789474 400.000000 496 548.023599 606.010638 683.160984 936.628265 1437.944586 1694.879668 ... 5734.162109 5970.165085 6143.112873 6321.395376 6573.365882 6942.136596 7261.721015 7730.097570 8192.881904 8568.967581
3 Latin America 400.000000 400.000000 416.457 437.558140 526.639004 691.060678 676.005331 1113.071149 1494.431922 ... 5765.585093 5889.237351 5846.295193 5746.609672 5785.841237 6063.068969 6265.525702 6530.533583 6783.869986 6973.134656
4 Asia 455.671021 469.961665 568.418 573.550859 571.605276 580.626115 553.459947 637.615593 695.131881 ... 3623.902724 3797.608955 3927.186275 4121.275511 4388.982705 4661.517477 4900.563281 5187.253152 5408.383588 5611.198564
5 Africa 472.352941 424.767802 413.71 422.071584 420.628684 419.755914 500.011054 601.236364 637.433138 ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474

6 rows × 70 columns

Let's convert from wide to long format

In [25]:
gdppc = pd.wide_to_long(gdppc, ['gdppc_'], i='Country', j='year').reset_index()
gdppc
Out[25]:
Country year gdppc_
0 Western Europe 1 576.168
1 Western Offshoots 1 400
2 East Europe 1 411.789
3 Latin America 1 400
4 Asia 1 455.671
... ... ... ...
409 Western Offshoots 2008 30151.8
410 East Europe 2008 8568.97
411 Latin America 2008 6973.13
412 Asia 2008 5611.2
413 Africa 2008 1780.27

414 rows × 3 columns

Plotting

We can now plot the data. Let's try two different ways. The first uses the plot function from pandas. The second uses the package seaborn, which improves on the capabilities of matplotlib. The main difference is how the data needs to be organized. Of course, these are not the only ways to plot and we can try others.

In [26]:
import matplotlib as mpl
import seaborn as sns
# Setup seaborn
sns.set()

Let's pivot the table so that each region is a column and each row is a year. This will allow us to plot using the plot function of the pandas DataFrame.

In [27]:
gdppc2 = gdppc.pivot_table(index='year',columns='Country',values='gdppc_',aggfunc='sum')
gdppc2
Out[27]:
Country Africa Asia East Europe Latin America Western Europe Western Offshoots
year
1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000
1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000
1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000
1600 422.071584 573.550859 548.023599 437.558140 887.906964 400.000000
1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000
... ... ... ... ... ... ...
2004 1558.099461 4661.517477 6942.136596 6063.068969 20199.220700 28807.845958
2005 1603.686517 4900.563281 7261.721015 6265.525702 20522.238008 29415.399334
2006 1663.531318 5187.253152 7730.097570 6530.533583 21087.304789 29922.741918
2007 1724.226776 5408.383588 8192.881904 6783.869986 21589.011346 30344.425293
2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880

69 rows × 6 columns

Ok. Let's plot using the pandas plot function.

In [404]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())

# Set the size of the figure and get a figure and axis object
fig, ax = plt.subplots(figsize=(30,20))
# Plot using the axis ax and colormap my_cmap
gdppc2.loc[1800:].plot(ax=ax, linewidth=8, cmap=my_cmap)
# Change options of axes, legend
ax.tick_params(axis = 'both', which = 'major', labelsize=32)
ax.tick_params(axis = 'both', which = 'minor', labelsize=16)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(prop={'size': 40}).set_title("Region", prop = {'size':40})
# Label axes
ax.set_xlabel('Year', fontsize=36)
ax.set_ylabel('GDP per capita (1990 Int\'l US$)', fontsize=36)
Out[404]:
Text(0, 0.5, "GDP per capita (1990 Int'l US$)")
In [71]:
fig
Out[71]:

Now, let's use seaborn

In [72]:
gdppc['Region'] = gdppc.Country.astype('category')
gdppc['gdppc_'] = gdppc.gdppc_.astype(float)
# Plot
fig, ax = plt.subplots(figsize=(30,20))
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[gdppc.year>=1800].reset_index(drop=True), alpha=1, lw=8, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=False)
ax.tick_params(axis = 'both', which = 'major', labelsize=32)
ax.tick_params(axis = 'both', which = 'minor', labelsize=16)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year', fontsize=36)
ax.set_ylabel('GDP per capita (1990 Int\'l US$)', fontsize=36)
Out[72]:
Text(0, 0.5, "GDP per capita (1990 Int'l US$)")
In [73]:
fig
Out[73]:

Nice! Basically the same plot. But we can do better! Let's use seaborn again, but this time use different markers for each region, and let's use only a subset of the data so that it looks better. Also, let's export the figure so we can use it in our slides.

In [74]:
# Create category for hue
gdppc['Region'] = gdppc.Country.astype('category')
gdppc['gdppc_'] = gdppc.gdppc_.astype(float)

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1800) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1820-2010.pdf', dpi=300, bbox_inches='tight')
In [75]:
fig
Out[75]:

Let's create the same plot using the updated data from the Maddison Project. Here we have less years, but the picture is similar.

In [76]:
maddison_new_region['Region'] = maddison_new_region.region_name

mycolors2 = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71", "orange", "b"]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='cgdppc', hue='Region', data=maddison_new_region.loc[(maddison_new_region.year.apply(lambda x: x in [1870, 1890, 1913, 1929,1950, 2016])) | ((maddison_new_region.year>1950) & (maddison_new_region.year.apply(lambda x: np.mod(x,10)==0)))], alpha=1, palette=sns.color_palette(mycolors2), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (2011 Int\'l US$)')
plt.savefig(pathgraphs + 'y1870-2016.pdf', dpi=300, bbox_inches='tight')
In [77]:
fig
Out[77]:

Let's show the evolution starting from other periods.

In [405]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1700) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'take-off-1700-2010.pdf', dpi=300, bbox_inches='tight')
In [ ]:
fig
In [406]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1500) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1500-2010.pdf', dpi=300, bbox_inches='tight')
In [407]:
fig
Out[407]:
In [80]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1000) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1000-2010.pdf', dpi=300, bbox_inches='tight')
In [81]:
fig
Out[81]:
In [82]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=0) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1-2010.pdf', dpi=300, bbox_inches='tight')
In [83]:
fig
Out[83]:

Let's plot the evolution of GDp per capita for the whole world

In [346]:
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country=='World Average']
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc
world_gdppc['Region'] = world_gdppc.Country.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=world_gdppc.loc[(world_gdppc.year>=0) & (world_gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'W-y1-2010.pdf', dpi=300, bbox_inches='tight')
In [347]:
fig
Out[347]:

Growth Rates

Let's select a subsample of periods between 1CE and 2008 and compute the growth rate per year of income per capita in the world. We will select the sample of years we want using the loc operator and then use the shift operator to get data from the previous observation.

In [363]:
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 2008]).astype(int)
world_gdppc
Out[363]:
Country year gdppc_ Region sample mysample
0 World Average 1 466.752281 World Average 1 1
1 World Average 1000 453.402162 World Average 1 1
2 World Average 1500 566.389464 World Average 1 1
3 World Average 1600 595.783856 World Average 0 0
4 World Average 1700 614.853602 World Average 0 0
5 World Average 1820 665.735330 World Average 1 1
55 World Average 1870 869.840661 World Average 0 0
85 World Average 1900 1261.225414 World Average 0 0
98 World Average 1913 1524.430799 World Average 0 0
125 World Average 1940 1958.332103 World Average 0 0
135 World Average 1950 2110.737588 World Average 0 0
136 World Average 1951 2197.005536 World Average 0 0
137 World Average 1952 2257.980328 World Average 0 0
138 World Average 1953 2328.987906 World Average 0 0
139 World Average 1954 2363.413886 World Average 0 0
140 World Average 1955 2466.717017 World Average 0 0
141 World Average 1956 2533.823012 World Average 0 0
142 World Average 1957 2577.800948 World Average 0 0
143 World Average 1958 2606.881211 World Average 0 0
144 World Average 1959 2674.797403 World Average 0 0
145 World Average 1960 2772.580591 World Average 0 0
146 World Average 1961 2830.908766 World Average 0 0
147 World Average 1962 2913.611804 World Average 0 0
148 World Average 1963 2977.974729 World Average 0 0
149 World Average 1964 3130.248410 World Average 0 0
150 World Average 1965 3228.264786 World Average 0 0
151 World Average 1966 3335.151849 World Average 0 0
152 World Average 1967 3390.257164 World Average 0 0
153 World Average 1968 3504.504061 World Average 0 0
154 World Average 1969 3623.574255 World Average 0 0
155 World Average 1970 3729.437335 World Average 0 0
156 World Average 1971 3802.966330 World Average 0 0
157 World Average 1972 3904.438494 World Average 0 0
158 World Average 1973 4082.588683 World Average 0 0
159 World Average 1974 4099.379789 World Average 0 0
160 World Average 1975 4087.268780 World Average 0 0
161 World Average 1976 4213.391921 World Average 0 0
162 World Average 1977 4309.226536 World Average 0 0
163 World Average 1978 4422.311835 World Average 0 0
164 World Average 1979 4499.773896 World Average 0 0
165 World Average 1980 4511.738921 World Average 0 0
166 World Average 1981 4523.443373 World Average 0 0
167 World Average 1982 4501.192723 World Average 0 0
168 World Average 1983 4541.033318 World Average 0 0
169 World Average 1984 4668.174520 World Average 0 0
170 World Average 1985 4748.022390 World Average 0 0
171 World Average 1986 4832.772634 World Average 0 0
172 World Average 1987 4932.172708 World Average 0 0
173 World Average 1988 5056.279247 World Average 0 0
174 World Average 1989 5130.036080 World Average 0 0
175 World Average 1990 5149.731182 World Average 0 0
176 World Average 1991 5137.263707 World Average 0 0
177 World Average 1992 5165.331071 World Average 0 0
178 World Average 1993 5199.875607 World Average 0 0
179 World Average 1994 5303.777226 World Average 0 0
180 World Average 1995 5446.067189 World Average 0 0
181 World Average 1996 5551.774088 World Average 0 0
182 World Average 1997 5690.019960 World Average 0 0
183 World Average 1998 5708.730954 World Average 0 0
184 World Average 1999 5833.255492 World Average 0 0
185 World Average 2000 6037.675887 World Average 0 0
186 World Average 2001 6131.705471 World Average 0 0
187 World Average 2002 6261.734267 World Average 0 0
188 World Average 2003 6469.119575 World Average 0 0
189 World Average 2004 6738.281333 World Average 0 0
190 World Average 2005 6960.031035 World Average 0 0
191 World Average 2006 7238.383483 World Average 0 0
192 World Average 2007 7467.648232 World Average 0 0
193 World Average 2008 7613.922924 World Average 1 1
In [446]:
maddison_growth = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth['year_prev'] = maddison_growth['year'] - maddison_growth['year'].shift(1)
maddison_growth['growth'] = ((maddison_growth['gdppc_'] / maddison_growth['gdppc_'].shift(1)) ** (1/ maddison_growth.year_prev) -1)
maddison_growth['Period'] = maddison_growth['year'].astype(str).shift(1) + '-' + maddison_growth['year'].astype(str)
maddison_growth    
Out[446]:
Country year gdppc_ Region mysample year_prev growth Period
0 World Average 1 466.752281 World 1 NaN NaN NaN
1 World Average 1000 453.402162 World 1 999.0 -0.000029 1-1000
2 World Average 1500 566.389464 World 1 500.0 0.000445 1000-1500
3 World Average 1820 665.735330 World 1 320.0 0.000505 1500-1820
4 World Average 1913 1524.430799 World 1 93.0 0.008948 1820-1913
In [447]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth',data=maddison_growth, alpha=1, palette=sns.color_palette("Blues", maddison_growth.shape[0]+4)[4:])
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
#handles, labels = ax.get_legend_handles_labels()
#ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate of Income per capita')
plt.savefig(pathgraphs + 'W-g1-2010.pdf', dpi=300, bbox_inches='tight')
In [448]:
fig
Out[448]:

Growth of population and income (by regions)

In [464]:
# Growth rates gdppc
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country=='World Average']
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = 'World'
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)
print(maddison_growth_gdppc)
         Country  year       gdppc_ Region  mysample  year_prev    growth     Period
0  World Average  1     466.752281   World  1        NaN        NaN        NaN      
1  World Average  1000  453.402162   World  1         999.0     -0.000029  1-1000   
2  World Average  1500  566.389464   World  1         500.0      0.000445  1000-1500
3  World Average  1820  665.735330   World  1         320.0      0.000505  1500-1820
4  World Average  1913  1524.430799  World  1         93.0       0.008948  1820-1913
In [465]:
# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country=='World Total']
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = 'World'
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
print(maddison_growth_pop)    
       Country  year          pop_ Region  mysample  year_prev    growth     Period
0  World Total  1     2.258200e+05  World  1        NaN        NaN        NaN      
1  World Total  1000  2.673300e+05  World  1         999.0      0.000169  1-1000   
2  World Total  1500  4.384280e+05  World  1         500.0      0.000990  1000-1500
3  World Total  1820  1.041708e+06  World  1         320.0      0.002708  1500-1820
4  World Total  1913  1.792925e+06  World  1         93.0       0.005856  1820-1913
In [442]:
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth
Out[442]:
Region Period GDPpc Population
1 World 1-1000 -0.000029 0.000169
2 World 1000-1500 0.000445 0.000990
3 World 1500-1820 0.000505 0.002708
4 World 1820-1913 0.008948 0.005856
In [461]:
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 
maddison_growth
Out[461]:
Region Period variable growth
0 World 1-1000 Income per capita -0.000029
1 World 1000-1500 Income per capita 0.000445
2 World 1500-1820 Income per capita 0.000505
3 World 1820-1913 Income per capita 0.008948
4 World 1-1000 Population 0.000169
5 World 1000-1500 Population 0.000990
6 World 1500-1820 Population 0.002708
7 World 1820-1913 Population 0.005856
In [462]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Region')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + 'W-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
In [463]:
fig
Out[463]:
In [491]:
# Growth rates gdppc
myregion = 'Western Offshoots'
fname = 'WO'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Region')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
In [492]:
fig
Out[492]:
In [495]:
# Growth rates gdppc
myregion = 'Western Europe'
fname = 'WE'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total 30  '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total 30  '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Region')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
In [496]:
fig
Out[496]:
In [497]:
# Growth rates gdppc
myregion = 'Latin America'
fname = 'LA'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Region')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
In [498]:
fig
Out[498]:
In [499]:
# Growth rates gdppc
myregion = 'Asia'
fname = 'AS'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Region')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
In [500]:
fig
Out[500]:
In [501]:
# Growth rates gdppc
myregion = 'Africa'
fname = 'AF'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Region')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
In [502]:
fig
Out[502]:

Comparing richest to poorest region across time

Let's create a table that shows the GDP per capita levels for the 6 regions in the original data and compute the ratio of richest to poorest. Let's also plot it.

In [350]:
gdppc2['Richest-Poorest Ratio'] = gdppc2.max(axis=1) / gdppc2.min(axis=1)
gdp_ratio = gdppc2.loc[[1, 1000, 1500, 1700, 1820, 1870, 1913, 1940, 1960, 1980, 2000, 2008]].T
gdp_ratio = gdp_ratio.T.reset_index()
gdp_ratio['Region'] = 'Richest-Poorest'
gdp_ratio['Region'] = gdp_ratio.Region.astype('category')
In [351]:
gdp_ratio
Out[351]:
Country year Africa Asia East Europe Latin America Western Europe Western Offshoots Richest-Poorest Ratio Region
0 1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000 1.440419 Richest-Poorest
1 1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000 1.174904 Richest-Poorest
2 1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000 1.927735 Richest-Poorest
3 1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000 2.361838 Richest-Poorest
4 1820 419.755914 580.626115 683.160984 691.060678 1194.184683 1201.993477 2.863553 Richest-Poorest
5 1870 500.011054 553.459947 936.628265 676.005331 1953.068150 2419.152411 4.838198 Richest-Poorest
6 1913 637.433138 695.131881 1694.879668 1494.431922 3456.576178 5232.816582 8.209201 Richest-Poorest
7 1940 813.374613 893.992784 1968.706774 1932.850716 4554.045082 6837.844866 8.406760 Richest-Poorest
8 1960 1055.114678 1025.743131 3069.750386 3135.517072 6879.294331 10961.082848 10.685992 Richest-Poorest
9 1980 1514.558119 2028.654705 5785.933433 5437.924365 13154.033928 18060.162963 11.924378 Richest-Poorest
10 2000 1447.071701 3797.608955 5970.165085 5889.237351 19176.001655 27393.808035 18.930512 Richest-Poorest
11 2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880 16.936691 Richest-Poorest
In [356]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Richest-Poorest Ratio', data=gdp_ratio, alpha=1, hue='Region', style='Region', dashes=False, markers=True, )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Richest-Poorest Ratio')
plt.savefig(pathgraphs + 'Richest-Poorest-Ratio.pdf', dpi=300, bbox_inches='tight')
In [357]:
fig
Out[357]:

Visualize as Table

In [39]:
gdp_ratio.style.format({
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1%}'.format, 1700: '{:,.1%}'.format, 
    1820: '{:,.1%}'.format, 1870: '{:,.1%}'.format, 1913: '{:,.1%}'.format, 1940: '{:,.1%}'.format, 
    1960: '{:,.1%}'.format, 1980: '{:,.1%}'.format, 2000: '{:,.1%}'.format, 2008: '{:,.1%}'.format, 
})
Out[39]:
year 1 1000 1500 1700 1820 1870 1913 1940 1960 1980 2000 2008
Country
Africa 472.4 424.8 41,371.0% 42,062.9% 41,975.6% 50,001.1% 63,743.3% 81,337.5% 105,511.5% 151,455.8% 144,707.2% 178,026.5%
Asia 455.7 470.0 56,841.8% 57,160.5% 58,062.6% 55,346.0% 69,513.2% 89,399.3% 102,574.3% 202,865.5% 379,760.9% 561,119.9%
East Europe 411.8 400.0 49,600.0% 60,601.1% 68,316.1% 93,662.8% 169,488.0% 196,870.7% 306,975.0% 578,593.3% 597,016.5% 856,896.8%
Latin America 400.0 400.0 41,645.7% 52,663.9% 69,106.1% 67,600.5% 149,443.2% 193,285.1% 313,551.7% 543,792.4% 588,923.7% 697,313.5%
Western Europe 576.2 427.4 77,109.4% 99,345.7% 119,418.5% 195,306.8% 345,657.6% 455,404.5% 687,929.4% 1,315,403.4% 1,917,600.2% 2,167,177.4%
Western Offshoots 400.0 400.0 40,000.0% 47,600.0% 120,199.3% 241,915.2% 523,281.7% 683,784.5% 1,096,108.3% 1,806,016.3% 2,739,380.8% 3,015,180.6%
Richest-Poorest Ratio 1.4 1.2 192.8% 236.2% 286.4% 483.8% 820.9% 840.7% 1,068.6% 1,192.4% 1,893.1% 1,693.7%

Export table to LaTeX

Let's print the table as LaTeX code that can be copied and pasted in our slides or paper.

In [46]:
print(gdp_ratio.to_latex(formatters={
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1f}'.format, 1700: '{:,.1f}'.format, 
    1820: '{:,.1f}'.format, 1870: '{:,.1f}'.format, 1913: '{:,.1f}'.format, 1940: '{:,.1f}'.format, 
    1960: '{:,.1f}'.format, 1980: '{:,.1f}'.format, 2000: '{:,.1f}'.format, 2008: '{:,.1f}'.format, 
}))
\begin{tabular}{lrrrrrrrrrrrr}
\toprule
year &  1    &  1000 &  1500 &  1700 &    1820 &    1870 &    1913 &    1940 &     1960 &     1980 &     2000 &     2008 \\
Country               &       &       &       &       &         &         &         &         &          &          &          &          \\
\midrule
Africa                & 472.4 & 424.8 & 413.7 & 420.6 &   419.8 &   500.0 &   637.4 &   813.4 &  1,055.1 &  1,514.6 &  1,447.1 &  1,780.3 \\
Asia                  & 455.7 & 470.0 & 568.4 & 571.6 &   580.6 &   553.5 &   695.1 &   894.0 &  1,025.7 &  2,028.7 &  3,797.6 &  5,611.2 \\
East Europe           & 411.8 & 400.0 & 496.0 & 606.0 &   683.2 &   936.6 & 1,694.9 & 1,968.7 &  3,069.8 &  5,785.9 &  5,970.2 &  8,569.0 \\
Latin America         & 400.0 & 400.0 & 416.5 & 526.6 &   691.1 &   676.0 & 1,494.4 & 1,932.9 &  3,135.5 &  5,437.9 &  5,889.2 &  6,973.1 \\
Western Europe        & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 &  6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\
Western Offshoots     & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\
Richest-Poorest Ratio &   1.4 &   1.2 &   1.9 &   2.4 &     2.9 &     4.8 &     8.2 &     8.4 &     10.7 &     11.9 &     18.9 &     16.9 \\
\bottomrule
\end{tabular}

In [47]:
%%latex
\begin{tabular}{lrrrrrrrrrrrr}
\toprule
year &  1    &  1000 &  1500 &  1700 &    1820 &    1870 &    1913 &    1940 &     1960 &     1980 &     2000 &     2008 \\
Country               &       &       &       &       &         &         &         &         &          &          &          &          \\
\midrule
Africa                & 472.4 & 424.8 & 413.7 & 420.6 &   419.8 &   500.0 &   637.4 &   813.4 &  1,055.1 &  1,514.6 &  1,447.1 &  1,780.3 \\
Asia                  & 455.7 & 470.0 & 568.4 & 571.6 &   580.6 &   553.5 &   695.1 &   894.0 &  1,025.7 &  2,028.7 &  3,797.6 &  5,611.2 \\
East Europe           & 411.8 & 400.0 & 496.0 & 606.0 &   683.2 &   936.6 & 1,694.9 & 1,968.7 &  3,069.8 &  5,785.9 &  5,970.2 &  8,569.0 \\
Latin America         & 400.0 & 400.0 & 416.5 & 526.6 &   691.1 &   676.0 & 1,494.4 & 1,932.9 &  3,135.5 &  5,437.9 &  5,889.2 &  6,973.1 \\
Western Europe        & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 &  6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\
Western Offshoots     & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\
Richest-Poorest Ratio &   1.4 &   1.2 &   1.9 &   2.4 &     2.9 &     4.8 &     8.2 &     8.4 &     10.7 &     11.9 &     18.9 &     16.9 \\
\bottomrule
\end{tabular}
\begin{tabular}{lrrrrrrrrrrrr} \toprule year & 1 & 1000 & 1500 & 1700 & 1820 & 1870 & 1913 & 1940 & 1960 & 1980 & 2000 & 2008 \\ Country & & & & & & & & & & & & \\ \midrule Africa & 472.4 & 424.8 & 413.7 & 420.6 & 419.8 & 500.0 & 637.4 & 813.4 & 1,055.1 & 1,514.6 & 1,447.1 & 1,780.3 \\ Asia & 455.7 & 470.0 & 568.4 & 571.6 & 580.6 & 553.5 & 695.1 & 894.0 & 1,025.7 & 2,028.7 & 3,797.6 & 5,611.2 \\ East Europe & 411.8 & 400.0 & 496.0 & 606.0 & 683.2 & 936.6 & 1,694.9 & 1,968.7 & 3,069.8 & 5,785.9 & 5,970.2 & 8,569.0 \\ Latin America & 400.0 & 400.0 & 416.5 & 526.6 & 691.1 & 676.0 & 1,494.4 & 1,932.9 & 3,135.5 & 5,437.9 & 5,889.2 & 6,973.1 \\ Western Europe & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 & 6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\ Western Offshoots & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\ Richest-Poorest Ratio & 1.4 & 1.2 & 1.9 & 2.4 & 2.9 & 4.8 & 8.2 & 8.4 & 10.7 & 11.9 & 18.9 & 16.9 \\ \bottomrule \end{tabular}

Export Table to HTML

In [54]:
from IPython.display import display, HTML
display(HTML(gdp_ratio.to_html(formatters={
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1f}'.format, 1700: '{:,.1f}'.format, 
    1820: '{:,.1f}'.format, 1870: '{:,.1f}'.format, 1913: '{:,.1f}'.format, 1940: '{:,.1f}'.format, 
    1960: '{:,.1f}'.format, 1980: '{:,.1f}'.format, 2000: '{:,.1f}'.format, 2008: '{:,.1f}'.format, 
})))
year 1 1000 1500 1700 1820 1870 1913 1940 1960 1980 2000 2008
Country
Africa 472.4 424.8 413.7 420.6 419.8 500.0 637.4 813.4 1,055.1 1,514.6 1,447.1 1,780.3
Asia 455.7 470.0 568.4 571.6 580.6 553.5 695.1 894.0 1,025.7 2,028.7 3,797.6 5,611.2
East Europe 411.8 400.0 496.0 606.0 683.2 936.6 1,694.9 1,968.7 3,069.8 5,785.9 5,970.2 8,569.0
Latin America 400.0 400.0 416.5 526.6 691.1 676.0 1,494.4 1,932.9 3,135.5 5,437.9 5,889.2 6,973.1
Western Europe 576.2 427.4 771.1 993.5 1,194.2 1,953.1 3,456.6 4,554.0 6,879.3 13,154.0 19,176.0 21,671.8
Western Offshoots 400.0 400.0 400.0 476.0 1,202.0 2,419.2 5,232.8 6,837.8 10,961.1 18,060.2 27,393.8 30,151.8
Richest-Poorest Ratio 1.4 1.2 1.9 2.4 2.9 4.8 8.2 8.4 10.7 11.9 18.9 16.9

Take-off, industrialization and reversals

Industrialization per capita

Let's create a full dataframe inserting the data by hand. This is based on data from Bairoch, P., 1982. "International industrialization levels from 1750 to 1980". Journal of European Economic History, 11(2), p.269. for 1750-1913 the data comes from Table 9

image.png

In [522]:
industrialization = [['Developed Countries', 8, 8, 11, 16, 24, 35, 55],
                     ['Europe', 8, 8, 11, 17, 23, 33, 45],
                     ['Austria-Hungary', 7, 7, 8, 11, 15, 23, 32],
                     ['Belgium', 9, 10, 14, 28, 43, 56, 88],
                     ['France', 9, 9, 12, 20, 28, 39, 59],
                     ['Germany', 8, 8, 9, 15, 25, 52, 85],
                     ['Italy', 8, 8, 8, 10, 12, 17, 26],
                     ['Russia', 6, 6, 7, 8, 10, 15, 20],
                     ['Spain', 7, 7, 8, 11, 14, 19, 22],
                     ['Sweden', 7, 8, 9, 15, 24, 41, 67],
                     ['Switzerland', 7, 10, 16, 26, 39, 67, 87],
                     ['United Kingdom', 10, 16, 25, 64, 87, 100, 115],
                     ['Canada', np.nan, 5, 6, 7, 10, 24, 46],
                     ['United States', 4, 9, 14, 21, 38, 69, 126],
                     ['Japan', 7, 7, 7, 7, 9, 12, 20],
                     ['Third World', 7, 6, 6, 4, 3, 2, 2],
                     ['China', 8, 6, 6, 4, 4, 3, 3],
                     ['India', 7, 6, 6, 3, 2, 1, 2],
                     ['Brazil', np.nan, np.nan, np.nan, 4, 4, 5, 7],
                     ['Mexico', np.nan, np.nan, np.nan, 5, 4, 5, 7],
                     ['World', 7, 6, 7, 7, 9, 14, 21]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
industrialization = pd.DataFrame(industrialization, columns=['Country'] + ['y'+str(y) for y in years])

For 1913-1980 the data comes from Table 12

image.png

In [523]:
industrialization2 = [['Developed Countries', 55, 71, 81, 135, 194, 315, 344],
                      ['Market Economies', np.nan, 96, 105, 167, 222, 362, 387],
                      ['Europe', 45, 76, 94, 107, 166, 260, 280],
                      ['Belgium', 88, 116, 89, 117, 183, 291, 316],
                      ['France', 59, 82, 73, 95, 167, 259, 277],
                      ['Germany', 85, 101, 128, 144, 244, 366, 395],
                      ['Italy', 26, 39, 44, 61, 121, 194, 231],
                      ['Spain', 22, 28, 23, 31, 56, 144, 159],
                      ['Sweden', 67, 84, 135, 163, 262, 405, 409],
                      ['Switzerland', 87, 90, 88, 167, 259, 366, 354],
                      ['United Kingdom', 115, 122, 157, 210, 253, 341, 325],
                      ['Canada', 46, 82, 84, 185, 237, 370, 379],
                      ['United States', 126, 182, 167, 354, 393, 604, 629],
                      ['Japan', 20, 30, 51, 40, 113, 310, 353],
                      ['U.S.S.R.', 20, 20, 38, 73, 139, 222, 252],
                      ['Third World', 2, 3, 4, 5, 8, 14, 17],
                      ['India', 2, 3, 4, 6, 8, 14, 16],
                      ['Brazil', 7, 10, 10, 13, 23, 42, 55],
                      ['Mexico', 7, 9, 8, 12, 22, 36, 41],
                      ['China', 3, 4, 4, 5, 10, 18, 24],
                      ['World', 21, 28, 31 ,48, 66, 100, 103]]
years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
industrialization2 = pd.DataFrame(industrialization2, columns=['Country'] + ['y'+str(y) for y in years])

Let's join both dataframes so we can plot the whole series.

In [524]:
industrialization = industrialization.merge(industrialization2)
industrialization
Out[524]:
Country y1750 y1800 y1830 y1860 y1880 y1900 y1913 y1928 y1938 y1953 y1963 y1973 y1980
0 Developed Countries 8.0 8.0 11.0 16 24 35 55 71 81 135 194 315 344
1 Europe 8.0 8.0 11.0 17 23 33 45 76 94 107 166 260 280
2 Belgium 9.0 10.0 14.0 28 43 56 88 116 89 117 183 291 316
3 France 9.0 9.0 12.0 20 28 39 59 82 73 95 167 259 277
4 Germany 8.0 8.0 9.0 15 25 52 85 101 128 144 244 366 395
5 Italy 8.0 8.0 8.0 10 12 17 26 39 44 61 121 194 231
6 Spain 7.0 7.0 8.0 11 14 19 22 28 23 31 56 144 159
7 Sweden 7.0 8.0 9.0 15 24 41 67 84 135 163 262 405 409
8 Switzerland 7.0 10.0 16.0 26 39 67 87 90 88 167 259 366 354
9 United Kingdom 10.0 16.0 25.0 64 87 100 115 122 157 210 253 341 325
10 Canada NaN 5.0 6.0 7 10 24 46 82 84 185 237 370 379
11 United States 4.0 9.0 14.0 21 38 69 126 182 167 354 393 604 629
12 Third World 7.0 6.0 6.0 4 3 2 2 3 4 5 8 14 17
13 Japan 7.0 7.0 7.0 7 9 12 20 30 51 40 113 310 353
14 China 8.0 6.0 6.0 4 4 3 3 4 4 5 10 18 24
15 India 7.0 6.0 6.0 3 2 1 2 3 4 6 8 14 16
16 Brazil NaN NaN NaN 4 4 5 7 10 10 13 23 42 55
17 Mexico NaN NaN NaN 5 4 5 7 9 8 12 22 36 41
18 World 7.0 6.0 7.0 7 9 14 21 28 31 48 66 100 103

Let's convert to long format and plot the evolution of industrialization across regions and groups of countries.

In [525]:
industrialization = pd.wide_to_long(industrialization, ['y'], i='Country', j='year').reset_index()
industrialization.rename(columns={'y':'Industrialization'}, inplace=True)
Out[525]:
Country year Industrialization
0 Developed Countries 1750 8.0
1 Europe 1750 8.0
2 Belgium 1750 9.0
3 France 1750 9.0
4 Germany 1750 8.0
5 Italy 1750 8.0
6 Spain 1750 7.0
7 Sweden 1750 7.0
8 Switzerland 1750 7.0
9 United Kingdom 1750 10.0
10 Canada 1750 NaN
11 United States 1750 4.0
12 Third World 1750 7.0
13 Japan 1750 7.0
14 China 1750 8.0
15 India 1750 7.0
16 Brazil 1750 NaN
17 Mexico 1750 NaN
18 World 1750 7.0
19 Developed Countries 1800 8.0
20 Europe 1800 8.0
21 Belgium 1800 10.0
22 France 1800 9.0
23 Germany 1800 8.0
24 Italy 1800 8.0
25 Spain 1800 7.0
26 Sweden 1800 8.0
27 Switzerland 1800 10.0
28 United Kingdom 1800 16.0
29 Canada 1800 5.0
30 United States 1800 9.0
31 Third World 1800 6.0
32 Japan 1800 7.0
33 China 1800 6.0
34 India 1800 6.0
35 Brazil 1800 NaN
36 Mexico 1800 NaN
37 World 1800 6.0
38 Developed Countries 1830 11.0
39 Europe 1830 11.0
40 Belgium 1830 14.0
41 France 1830 12.0
42 Germany 1830 9.0
43 Italy 1830 8.0
44 Spain 1830 8.0
45 Sweden 1830 9.0
46 Switzerland 1830 16.0
47 United Kingdom 1830 25.0
48 Canada 1830 6.0
49 United States 1830 14.0
50 Third World 1830 6.0
51 Japan 1830 7.0
52 China 1830 6.0
53 India 1830 6.0
54 Brazil 1830 NaN
55 Mexico 1830 NaN
56 World 1830 7.0
57 Developed Countries 1860 16.0
58 Europe 1860 17.0
59 Belgium 1860 28.0
60 France 1860 20.0
61 Germany 1860 15.0
62 Italy 1860 10.0
63 Spain 1860 11.0
64 Sweden 1860 15.0
65 Switzerland 1860 26.0
66 United Kingdom 1860 64.0
67 Canada 1860 7.0
68 United States 1860 21.0
69 Third World 1860 4.0
70 Japan 1860 7.0
71 China 1860 4.0
72 India 1860 3.0
73 Brazil 1860 4.0
74 Mexico 1860 5.0
75 World 1860 7.0
76 Developed Countries 1880 24.0
77 Europe 1880 23.0
78 Belgium 1880 43.0
79 France 1880 28.0
80 Germany 1880 25.0
81 Italy 1880 12.0
82 Spain 1880 14.0
83 Sweden 1880 24.0
84 Switzerland 1880 39.0
85 United Kingdom 1880 87.0
86 Canada 1880 10.0
87 United States 1880 38.0
88 Third World 1880 3.0
89 Japan 1880 9.0
90 China 1880 4.0
91 India 1880 2.0
92 Brazil 1880 4.0
93 Mexico 1880 4.0
94 World 1880 9.0
95 Developed Countries 1900 35.0
96 Europe 1900 33.0
97 Belgium 1900 56.0
98 France 1900 39.0
99 Germany 1900 52.0
100 Italy 1900 17.0
101 Spain 1900 19.0
102 Sweden 1900 41.0
103 Switzerland 1900 67.0
104 United Kingdom 1900 100.0
105 Canada 1900 24.0
106 United States 1900 69.0
107 Third World 1900 2.0
108 Japan 1900 12.0
109 China 1900 3.0
110 India 1900 1.0
111 Brazil 1900 5.0
112 Mexico 1900 5.0
113 World 1900 14.0
114 Developed Countries 1913 55.0
115 Europe 1913 45.0
116 Belgium 1913 88.0
117 France 1913 59.0
118 Germany 1913 85.0
119 Italy 1913 26.0
120 Spain 1913 22.0
121 Sweden 1913 67.0
122 Switzerland 1913 87.0
123 United Kingdom 1913 115.0
124 Canada 1913 46.0
125 United States 1913 126.0
126 Third World 1913 2.0
127 Japan 1913 20.0
128 China 1913 3.0
129 India 1913 2.0
130 Brazil 1913 7.0
131 Mexico 1913 7.0
132 World 1913 21.0
133 Developed Countries 1928 71.0
134 Europe 1928 76.0
135 Belgium 1928 116.0
136 France 1928 82.0
137 Germany 1928 101.0
138 Italy 1928 39.0
139 Spain 1928 28.0
140 Sweden 1928 84.0
141 Switzerland 1928 90.0
142 United Kingdom 1928 122.0
143 Canada 1928 82.0
144 United States 1928 182.0
145 Third World 1928 3.0
146 Japan 1928 30.0
147 China 1928 4.0
148 India 1928 3.0
149 Brazil 1928 10.0
150 Mexico 1928 9.0
151 World 1928 28.0
152 Developed Countries 1938 81.0
153 Europe 1938 94.0
154 Belgium 1938 89.0
155 France 1938 73.0
156 Germany 1938 128.0
157 Italy 1938 44.0
158 Spain 1938 23.0
159 Sweden 1938 135.0
160 Switzerland 1938 88.0
161 United Kingdom 1938 157.0
162 Canada 1938 84.0
163 United States 1938 167.0
164 Third World 1938 4.0
165 Japan 1938 51.0
166 China 1938 4.0
167 India 1938 4.0
168 Brazil 1938 10.0
169 Mexico 1938 8.0
170 World 1938 31.0
171 Developed Countries 1953 135.0
172 Europe 1953 107.0
173 Belgium 1953 117.0
174 France 1953 95.0
175 Germany 1953 144.0
176 Italy 1953 61.0
177 Spain 1953 31.0
178 Sweden 1953 163.0
179 Switzerland 1953 167.0
180 United Kingdom 1953 210.0
181 Canada 1953 185.0
182 United States 1953 354.0
183 Third World 1953 5.0
184 Japan 1953 40.0
185 China 1953 5.0
186 India 1953 6.0
187 Brazil 1953 13.0
188 Mexico 1953 12.0
189 World 1953 48.0
190 Developed Countries 1963 194.0
191 Europe 1963 166.0
192 Belgium 1963 183.0
193 France 1963 167.0
194 Germany 1963 244.0
195 Italy 1963 121.0
196 Spain 1963 56.0
197 Sweden 1963 262.0
198 Switzerland 1963 259.0
199 United Kingdom 1963 253.0
200 Canada 1963 237.0
201 United States 1963 393.0
202 Third World 1963 8.0
203 Japan 1963 113.0
204 China 1963 10.0
205 India 1963 8.0
206 Brazil 1963 23.0
207 Mexico 1963 22.0
208 World 1963 66.0
209 Developed Countries 1973 315.0
210 Europe 1973 260.0
211 Belgium 1973 291.0
212 France 1973 259.0
213 Germany 1973 366.0
214 Italy 1973 194.0
215 Spain 1973 144.0
216 Sweden 1973 405.0
217 Switzerland 1973 366.0
218 United Kingdom 1973 341.0
219 Canada 1973 370.0
220 United States 1973 604.0
221 Third World 1973 14.0
222 Japan 1973 310.0
223 China 1973 18.0
224 India 1973 14.0
225 Brazil 1973 42.0
226 Mexico 1973 36.0
227 World 1973 100.0
228 Developed Countries 1980 344.0
229 Europe 1980 280.0
230 Belgium 1980 316.0
231 France 1980 277.0
232 Germany 1980 395.0
233 Italy 1980 231.0
234 Spain 1980 159.0
235 Sweden 1980 409.0
236 Switzerland 1980 354.0
237 United Kingdom 1980 325.0
238 Canada 1980 379.0
239 United States 1980 629.0
240 Third World 1980 17.0
241 Japan 1980 353.0
242 China 1980 24.0
243 India 1980 16.0
244 Brazil 1980 55.0
245 Mexico 1980 41.0
246 World 1980 103.0
In [554]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')
In [555]:
fig
Out[555]:
In [557]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

industrialization['dev_level'] = industrialization.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-Dev.pdf', dpi=300, bbox_inches='tight')
In [558]:
fig
Out[558]:
In [561]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-NonDev.pdf', dpi=300, bbox_inches='tight')
In [562]:
fig
Out[562]:
In [566]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[
                 (industrialization.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (industrialization.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-UK-IND.pdf', dpi=300, bbox_inches='tight')
In [567]:
fig
Out[567]:

Manufacturing

Let's use data from the same source to explore what happened to the share of manufacturing across regions.

image.png

image.png

In [575]:
# 1750-1913
manufacturing = [['Developed Countries', 27.0, 32.3, 39.5, 63.4, 79.1, 89.0, 92.5],
                 ['Europe', 23.2, 28.1, 34.2, 53.2, 61.3, 62.0, 56.6],
                 ['Austria-Hungary', 2.9, 3.2, 3.2, 4.2, 4.4, 4.7, 4.4],
                 ['Belgium', 0.3, 0.5, 0.7, 1.4, 1.8, 1.7, 1.8],
                 ['France', 4.0, 4.2, 5.2, 7.9, 7.8, 6.8, 6.1],
                 ['Germany', 2.9, 3.5, 3.5, 4.9, 8.5, 13.2, 14.8],
                 ['Italy', 2.4, 2.5, 2.3, 2.5, 2.5, 2.5, 2.4],
                 ['Russia', 5.0, 5.6, 5.6, 7.0, 7.6, 8.8, 8.2],
                 ['Spain', 1.2, 1.5, 1.5, 1.8, 1.8, 1.6, 1.2],
                 ['Sweden', 0.3, 0.3, 0.4, 0.6, 0.8, 0.9, 1.0],
                 ['Switzerland', 0.1, 0.3, 0.4, 0.7, 0.8, 1.0, 0.9],
                 ['United Kingdom', 1.9, 4.3, 9.5, 19.9, 22.9, 18.5, 13.6],
                 ['Canada', np.nan, np.nan, 0.1, 0.3, 0.4, 0.6, 0.9],
                 ['United States', 0.1, 0.8, 2.4, 7.2, 14.7, 23.6, 32.0],
                 ['Japan', 3.8, 3.5, 2.8, 2.6, 2.4, 2.4, 2.7],
                 ['Third World', 73.0, 67.7, 60.5, 36.6, 20.9, 11.0, 7.5],
                 ['China', 32.8, 33.3, 29.8, 19.7, 12.5, 6.2, 3.6],
                 ['India', 24.5, 19.7, 17.6, 8.6, 2.8, 1.7, 1.4],
                 ['Brazil', np.nan, np.nan, np.nan, 0.4, 0.3, 0.4, 0.5],
                 ['Mexico', np.nan, np.nan, np.nan, 0.4, 0.3, 0.3, 0.3]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
manufacturing = pd.DataFrame(manufacturing, columns=['Country'] + ['y'+str(y) for y in years])

# 1913-1980
manufacturing2 = [['Developed Countries', 92.5, 92.8, 92.8, 93.5, 91.5, 90.1, 88.0],
                  ['Market Economies', 76.7, 80.3, 76.5, 77.5, 70.5, 70.0, 66.9],
                  ['Europe', 40.8, 35.4, 37.3, 26.1, 26.5, 24.5, 22.9],
                  ['Belgium', 1.8, 1.7, 1.1, 0.8, 0.8, 0.7, 0.7],
                  ['France', 6.1, 6.0, 4.4, 3.2, 3.8, 3.5, 3.3],
                  ['Germany', 14.8, 11.6, 12.7, 5.9, 6.4, 5.9, 5.3],
                  ['Italy', 2.4, 2.7, 2.8, 2.3, 2.9, 2.9, 2.9],
                  ['Spain', 1.2, 1.1, 0.8, 0.7, 0.8, 1.3, 1.4],
                  ['Sweden', 1.0, 0.9, 1.2, 0.9, 0.9, 0.9, 0.8],
                  ['Switzerland', 0.9, 0.7, 0.5, 0.7, 0.7, 0.6, 0.5],
                  ['United Kingdom', 13.6, 9.9, 10.7, 8.4, 6.4, 4.9, 4.0],
                  ['Canada', 0.9, 1.5, 1.4, 2.2, 2.1, 2.1, 2.0],
                  ['United States', 32.0, 39.3, 31.4, 44.7, 35.1, 33.0, 31.5],
                  ['Japan', 2.7, 3.3, 5.2, 2.9, 5.1, 8.8, 9.1],
                  ['U.S.S.R.', 8.2, 5.3, 9.0, 10.7, 14.2, 14.4, 14.8],
                  ['Third World', 7.5, 7.2, 7.2, 6.5, 8.5, 9.9, 12.0],
                  ['India', 1.4, 1.9, 2.4, 1.7, 1.8, 2.1, 2.3],
                  ['Brazil', 0.5, 0.6, 0.6, 0.6, 0.8, 1.1, 1.4],
                  ['Mexico', 0.3, 0.2, 0.2, 0.3, 0.4, 0.5, 0.6],
                  ['China', 3.6, 3.4, 3.1, 2.3, 3.5, 3.9, 5.0]]
years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
manufacturing2 = pd.DataFrame(manufacturing2, columns=['Country'] + ['y'+str(y) for y in years])

# Merge
manufacturing = manufacturing.merge(manufacturing2)
manufacturing = pd.wide_to_long(manufacturing, ['y'], i='Country', j='year').reset_index()
manufacturing.rename(columns={'y':'manufacturing'}, inplace=True)
manufacturing['manufacturing'] = manufacturing.manufacturing / 100
manufacturing
Out[575]:
Country year manufacturing
0 Developed Countries 1750 0.270
1 Belgium 1750 0.003
2 France 1750 0.040
3 Germany 1750 0.029
4 Italy 1750 0.024
5 Spain 1750 0.012
6 Sweden 1750 0.003
7 Switzerland 1750 0.001
8 United Kingdom 1750 0.019
9 Canada 1750 NaN
10 United States 1750 0.001
11 Japan 1750 0.038
12 Third World 1750 0.730
13 China 1750 0.328
14 India 1750 0.245
15 Brazil 1750 NaN
16 Mexico 1750 NaN
17 Developed Countries 1800 0.323
18 Belgium 1800 0.005
19 France 1800 0.042
20 Germany 1800 0.035
21 Italy 1800 0.025
22 Spain 1800 0.015
23 Sweden 1800 0.003
24 Switzerland 1800 0.003
25 United Kingdom 1800 0.043
26 Canada 1800 NaN
27 United States 1800 0.008
28 Japan 1800 0.035
29 Third World 1800 0.677
30 China 1800 0.333
31 India 1800 0.197
32 Brazil 1800 NaN
33 Mexico 1800 NaN
34 Developed Countries 1830 0.395
35 Belgium 1830 0.007
36 France 1830 0.052
37 Germany 1830 0.035
38 Italy 1830 0.023
39 Spain 1830 0.015
40 Sweden 1830 0.004
41 Switzerland 1830 0.004
42 United Kingdom 1830 0.095
43 Canada 1830 0.001
44 United States 1830 0.024
45 Japan 1830 0.028
46 Third World 1830 0.605
47 China 1830 0.298
48 India 1830 0.176
49 Brazil 1830 NaN
50 Mexico 1830 NaN
51 Developed Countries 1860 0.634
52 Belgium 1860 0.014
53 France 1860 0.079
54 Germany 1860 0.049
55 Italy 1860 0.025
56 Spain 1860 0.018
57 Sweden 1860 0.006
58 Switzerland 1860 0.007
59 United Kingdom 1860 0.199
60 Canada 1860 0.003
61 United States 1860 0.072
62 Japan 1860 0.026
63 Third World 1860 0.366
64 China 1860 0.197
65 India 1860 0.086
66 Brazil 1860 0.004
67 Mexico 1860 0.004
68 Developed Countries 1880 0.791
69 Belgium 1880 0.018
70 France 1880 0.078
71 Germany 1880 0.085
72 Italy 1880 0.025
73 Spain 1880 0.018
74 Sweden 1880 0.008
75 Switzerland 1880 0.008
76 United Kingdom 1880 0.229
77 Canada 1880 0.004
78 United States 1880 0.147
79 Japan 1880 0.024
80 Third World 1880 0.209
81 China 1880 0.125
82 India 1880 0.028
83 Brazil 1880 0.003
84 Mexico 1880 0.003
85 Developed Countries 1900 0.890
86 Belgium 1900 0.017
87 France 1900 0.068
88 Germany 1900 0.132
89 Italy 1900 0.025
90 Spain 1900 0.016
91 Sweden 1900 0.009
92 Switzerland 1900 0.010
93 United Kingdom 1900 0.185
94 Canada 1900 0.006
95 United States 1900 0.236
96 Japan 1900 0.024
97 Third World 1900 0.110
98 China 1900 0.062
99 India 1900 0.017
100 Brazil 1900 0.004
101 Mexico 1900 0.003
102 Developed Countries 1913 0.925
103 Belgium 1913 0.018
104 France 1913 0.061
105 Germany 1913 0.148
106 Italy 1913 0.024
107 Spain 1913 0.012
108 Sweden 1913 0.010
109 Switzerland 1913 0.009
110 United Kingdom 1913 0.136
111 Canada 1913 0.009
112 United States 1913 0.320
113 Japan 1913 0.027
114 Third World 1913 0.075
115 China 1913 0.036
116 India 1913 0.014
117 Brazil 1913 0.005
118 Mexico 1913 0.003
119 Developed Countries 1928 0.928
120 Belgium 1928 0.017
121 France 1928 0.060
122 Germany 1928 0.116
123 Italy 1928 0.027
124 Spain 1928 0.011
125 Sweden 1928 0.009
126 Switzerland 1928 0.007
127 United Kingdom 1928 0.099
128 Canada 1928 0.015
129 United States 1928 0.393
130 Japan 1928 0.033
131 Third World 1928 0.072
132 China 1928 0.034
133 India 1928 0.019
134 Brazil 1928 0.006
135 Mexico 1928 0.002
136 Developed Countries 1938 0.928
137 Belgium 1938 0.011
138 France 1938 0.044
139 Germany 1938 0.127
140 Italy 1938 0.028
141 Spain 1938 0.008
142 Sweden 1938 0.012
143 Switzerland 1938 0.005
144 United Kingdom 1938 0.107
145 Canada 1938 0.014
146 United States 1938 0.314
147 Japan 1938 0.052
148 Third World 1938 0.072
149 China 1938 0.031
150 India 1938 0.024
151 Brazil 1938 0.006
152 Mexico 1938 0.002
153 Developed Countries 1953 0.935
154 Belgium 1953 0.008
155 France 1953 0.032
156 Germany 1953 0.059
157 Italy 1953 0.023
158 Spain 1953 0.007
159 Sweden 1953 0.009
160 Switzerland 1953 0.007
161 United Kingdom 1953 0.084
162 Canada 1953 0.022
163 United States 1953 0.447
164 Japan 1953 0.029
165 Third World 1953 0.065
166 China 1953 0.023
167 India 1953 0.017
168 Brazil 1953 0.006
169 Mexico 1953 0.003
170 Developed Countries 1963 0.915
171 Belgium 1963 0.008
172 France 1963 0.038
173 Germany 1963 0.064
174 Italy 1963 0.029
175 Spain 1963 0.008
176 Sweden 1963 0.009
177 Switzerland 1963 0.007
178 United Kingdom 1963 0.064
179 Canada 1963 0.021
180 United States 1963 0.351
181 Japan 1963 0.051
182 Third World 1963 0.085
183 China 1963 0.035
184 India 1963 0.018
185 Brazil 1963 0.008
186 Mexico 1963 0.004
187 Developed Countries 1973 0.901
188 Belgium 1973 0.007
189 France 1973 0.035
190 Germany 1973 0.059
191 Italy 1973 0.029
192 Spain 1973 0.013
193 Sweden 1973 0.009
194 Switzerland 1973 0.006
195 United Kingdom 1973 0.049
196 Canada 1973 0.021
197 United States 1973 0.330
198 Japan 1973 0.088
199 Third World 1973 0.099
200 China 1973 0.039
201 India 1973 0.021
202 Brazil 1973 0.011
203 Mexico 1973 0.005
204 Developed Countries 1980 0.880
205 Belgium 1980 0.007
206 France 1980 0.033
207 Germany 1980 0.053
208 Italy 1980 0.029
209 Spain 1980 0.014
210 Sweden 1980 0.008
211 Switzerland 1980 0.005
212 United Kingdom 1980 0.040
213 Canada 1980 0.020
214 United States 1980 0.315
215 Japan 1980 0.091
216 Third World 1980 0.120
217 China 1980 0.050
218 India 1980 0.023
219 Brazil 1980 0.014
220 Mexico 1980 0.006
In [576]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')
In [577]:
fig
Out[577]:
In [579]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

manufacturing['dev_level'] = manufacturing.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-Dev.pdf', dpi=300, bbox_inches='tight')
In [580]:
fig
Out[580]:
In [581]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-NonDev.pdf', dpi=300, bbox_inches='tight')
In [582]:
fig
Out[582]:
In [583]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[
                 (manufacturing.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (manufacturing.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'manufacturing-UK-IND.pdf', dpi=300, bbox_inches='tight')
In [584]:
fig
Out[584]:

Industrial Potential

We can also explore the industrial potantial of these countries.

image.png

image.png

In [600]:
# 1750-1913
indpotential = [['Developed Countries', 34.4, 47.4, 72.9, 143.2, 253.1, 481.2, 863.0,],
                ['Europe', 29.6, 41.2, 63.0, 120.3, 196.2, 335.4, 527.8,],
                ['Austria-Hungary', 3.7, 4.8, 5.8, 9.5, 14.0, 25.6, 40.7,],
                ['Belgium', 0.4, 0.7, 1.3, 3.1, 5.7, 9.2, 16.3,],
                ['France', 5.0, 6.2, 9.5, 17.9, 25.1, 36.8, 57.3,],
                ['Germany', 3.7, 5.2, 6.5, 11.1, 27.4, 71.2, 137.7,],
                ['Italy', 3.1, 3.7, 4.2, 5.7, 8.1, 13.6, 22.5,],
                ['Russia', 6.4, 8.3, 10.3, 15.8, 24.5, 47.5, 76.6,],
                ['Spain', 1.6, 2.1, 2.7, 4.0, 5.8, 8.5, 11.0,],
                ['Sweden', 0.3, 0.5, 0.6, 1.4, 2.6, 5.0, 9.0,],
                ['Switzerland', 0.2, 0.4, 0.8, 1.6, 2.6, 5.4, 8.0,],
                ['United Kingdom', 2.4, 6.2, 17.5, 45.0, 73.3, 100.0, 127.2,],
                ['Canada', np.nan, np.nan, 0.1, 0.6, 1.4, 3.2, 8.7,],
                ['United States', 0.1, 1.1, 4.6, 16.2, 46.9, 127.8, 298.1,],
                ['Japan', 4.8, 5.1, 5.2, 5.8, 7.6, 13.0, 25.1,],
                ['Third World', 92.9, 99.4, 111.5, 82.7, 67.0, 59.6, 69.5,],
                ['China', 41.7, 48.8, 54.9, 44.1, 39.9, 33.5, 33.3,],
                ['India', 31.2, 29.0, 32.5, 19.4, 8.8, 9.3, 13.1,],
                ['Brazil', np.nan, np.nan, np.nan, 0.9, 0.9, 2.1, 4.3,],
                ['Mexico', np.nan, np.nan, np.nan, 0.9, 0.8, 1.7, 2.7,],
                ['World', 127.3, 146.9, 184.4, 225.9, 320.1, 540.8, 932.5,]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
indpotential = pd.DataFrame(indpotential, columns=['Country'] + ['y'+str(y) for y in years])

# 1913-1980
indpotential2 = [['Developed Countries', 863, 1259, 1562, 2870, 4699, 8432, 9718],
                 ['Market Economies', 715, 1089, 1288, 2380, 3624, 6547, 7388],
                 ['Europe', 380, 480, 629, 801, 1361, 2290, 2529],
                 ['Belgium', 16, 22, 18, 25, 41, 69, 76],
                 ['France', 57, 82, 74, 98, 194, 328, 362],
                 ['Germany', 138, 158, 214, 180, 330, 550, 590],
                 ['Italy', 23, 37, 46, 71, 150, 258, 319],
                 ['Spain', 11, 16, 14, 22, 43, 122, 156],
                 ['Sweden', 9, 12, 21, 28, 48, 80, 83],
                 ['Switzerland', 8, 9, 9, 20, 37, 57, 54],
                 ['United Kingdom', 127, 135, 181, 258, 330, 462, 441],
                 ['Canada', 9, 20, 23, 66, 109, 199, 220],
                 ['United States', 298, 533, 528, 1373, 1804, 3089, 3475],
                 ['Japan', 25, 45, 88, 88, 264, 819, 1001],
                 ['U.S.S.R.', 77, 72, 152, 328, 760, 1345, 1630],
                 ['Third World', 70, 98, 122, 200, 439, 927, 1323],
                 ['India', 13, 26, 40, 52, 91, 194, 254],
                 ['Brazil', 4, 8, 10, 18, 42, 102, 159],
                 ['Mexico', 3, 3, 4, 9, 21, 47, 68],
                 ['China', 33, 46, 52, 71, 178, 369, 553],
                 ['World', 933, 1356, 1684, 3070, 5138, 9359, 11041]]

years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
indpotential2 = pd.DataFrame(indpotential2, columns=['Country'] + ['y'+str(y) for y in years])

# Merge
indpotential = indpotential.merge(indpotential2[indpotential2.columns.difference(['y1913'])])
indpotential = pd.wide_to_long(indpotential, ['y'], i='Country', j='year').reset_index()
indpotential.rename(columns={'y':'indpotential'}, inplace=True)
indpotential
Out[600]:
Country year indpotential
0 Developed Countries 1750 34.4
1 Europe 1750 29.6
2 Belgium 1750 0.4
3 France 1750 5.0
4 Germany 1750 3.7
5 Italy 1750 3.1
6 Spain 1750 1.6
7 Sweden 1750 0.3
8 Switzerland 1750 0.2
9 United Kingdom 1750 2.4
10 Canada 1750 NaN
11 United States 1750 0.1
12 Japan 1750 4.8
13 Third World 1750 92.9
14 China 1750 41.7
15 India 1750 31.2
16 Brazil 1750 NaN
17 Mexico 1750 NaN
18 World 1750 127.3
19 Developed Countries 1800 47.4
20 Europe 1800 41.2
21 Belgium 1800 0.7
22 France 1800 6.2
23 Germany 1800 5.2
24 Italy 1800 3.7
25 Spain 1800 2.1
26 Sweden 1800 0.5
27 Switzerland 1800 0.4
28 United Kingdom 1800 6.2
29 Canada 1800 NaN
30 United States 1800 1.1
31 Japan 1800 5.1
32 Third World 1800 99.4
33 China 1800 48.8
34 India 1800 29.0
35 Brazil 1800 NaN
36 Mexico 1800 NaN
37 World 1800 146.9
38 Developed Countries 1830 72.9
39 Europe 1830 63.0
40 Belgium 1830 1.3
41 France 1830 9.5
42 Germany 1830 6.5
43 Italy 1830 4.2
44 Spain 1830 2.7
45 Sweden 1830 0.6
46 Switzerland 1830 0.8
47 United Kingdom 1830 17.5
48 Canada 1830 0.1
49 United States 1830 4.6
50 Japan 1830 5.2
51 Third World 1830 111.5
52 China 1830 54.9
53 India 1830 32.5
54 Brazil 1830 NaN
55 Mexico 1830 NaN
56 World 1830 184.4
57 Developed Countries 1860 143.2
58 Europe 1860 120.3
59 Belgium 1860 3.1
60 France 1860 17.9
61 Germany 1860 11.1
62 Italy 1860 5.7
63 Spain 1860 4.0
64 Sweden 1860 1.4
65 Switzerland 1860 1.6
66 United Kingdom 1860 45.0
67 Canada 1860 0.6
68 United States 1860 16.2
69 Japan 1860 5.8
70 Third World 1860 82.7
71 China 1860 44.1
72 India 1860 19.4
73 Brazil 1860 0.9
74 Mexico 1860 0.9
75 World 1860 225.9
76 Developed Countries 1880 253.1
77 Europe 1880 196.2
78 Belgium 1880 5.7
79 France 1880 25.1
80 Germany 1880 27.4
81 Italy 1880 8.1
82 Spain 1880 5.8
83 Sweden 1880 2.6
84 Switzerland 1880 2.6
85 United Kingdom 1880 73.3
86 Canada 1880 1.4
87 United States 1880 46.9
88 Japan 1880 7.6
89 Third World 1880 67.0
90 China 1880 39.9
91 India 1880 8.8
92 Brazil 1880 0.9
93 Mexico 1880 0.8
94 World 1880 320.1
95 Developed Countries 1900 481.2
96 Europe 1900 335.4
97 Belgium 1900 9.2
98 France 1900 36.8
99 Germany 1900 71.2
100 Italy 1900 13.6
101 Spain 1900 8.5
102 Sweden 1900 5.0
103 Switzerland 1900 5.4
104 United Kingdom 1900 100.0
105 Canada 1900 3.2
106 United States 1900 127.8
107 Japan 1900 13.0
108 Third World 1900 59.6
109 China 1900 33.5
110 India 1900 9.3
111 Brazil 1900 2.1
112 Mexico 1900 1.7
113 World 1900 540.8
114 Developed Countries 1913 863.0
115 Europe 1913 527.8
116 Belgium 1913 16.3
117 France 1913 57.3
118 Germany 1913 137.7
119 Italy 1913 22.5
120 Spain 1913 11.0
121 Sweden 1913 9.0
122 Switzerland 1913 8.0
123 United Kingdom 1913 127.2
124 Canada 1913 8.7
125 United States 1913 298.1
126 Japan 1913 25.1
127 Third World 1913 69.5
128 China 1913 33.3
129 India 1913 13.1
130 Brazil 1913 4.3
131 Mexico 1913 2.7
132 World 1913 932.5
133 Developed Countries 1928 1259.0
134 Europe 1928 480.0
135 Belgium 1928 22.0
136 France 1928 82.0
137 Germany 1928 158.0
138 Italy 1928 37.0
139 Spain 1928 16.0
140 Sweden 1928 12.0
141 Switzerland 1928 9.0
142 United Kingdom 1928 135.0
143 Canada 1928 20.0
144 United States 1928 533.0
145 Japan 1928 45.0
146 Third World 1928 98.0
147 China 1928 46.0
148 India 1928 26.0
149 Brazil 1928 8.0
150 Mexico 1928 3.0
151 World 1928 1356.0
152 Developed Countries 1938 1562.0
153 Europe 1938 629.0
154 Belgium 1938 18.0
155 France 1938 74.0
156 Germany 1938 214.0
157 Italy 1938 46.0
158 Spain 1938 14.0
159 Sweden 1938 21.0
160 Switzerland 1938 9.0
161 United Kingdom 1938 181.0
162 Canada 1938 23.0
163 United States 1938 528.0
164 Japan 1938 88.0
165 Third World 1938 122.0
166 China 1938 52.0
167 India 1938 40.0
168 Brazil 1938 10.0
169 Mexico 1938 4.0
170 World 1938 1684.0
171 Developed Countries 1953 2870.0
172 Europe 1953 801.0
173 Belgium 1953 25.0
174 France 1953 98.0
175 Germany 1953 180.0
176 Italy 1953 71.0
177 Spain 1953 22.0
178 Sweden 1953 28.0
179 Switzerland 1953 20.0
180 United Kingdom 1953 258.0
181 Canada 1953 66.0
182 United States 1953 1373.0
183 Japan 1953 88.0
184 Third World 1953 200.0
185 China 1953 71.0
186 India 1953 52.0
187 Brazil 1953 18.0
188 Mexico 1953 9.0
189 World 1953 3070.0
190 Developed Countries 1963 4699.0
191 Europe 1963 1361.0
192 Belgium 1963 41.0
193 France 1963 194.0
194 Germany 1963 330.0
195 Italy 1963 150.0
196 Spain 1963 43.0
197 Sweden 1963 48.0
198 Switzerland 1963 37.0
199 United Kingdom 1963 330.0
200 Canada 1963 109.0
201 United States 1963 1804.0
202 Japan 1963 264.0
203 Third World 1963 439.0
204 China 1963 178.0
205 India 1963 91.0
206 Brazil 1963 42.0
207 Mexico 1963 21.0
208 World 1963 5138.0
209 Developed Countries 1973 8432.0
210 Europe 1973 2290.0
211 Belgium 1973 69.0
212 France 1973 328.0
213 Germany 1973 550.0
214 Italy 1973 258.0
215 Spain 1973 122.0
216 Sweden 1973 80.0
217 Switzerland 1973 57.0
218 United Kingdom 1973 462.0
219 Canada 1973 199.0
220 United States 1973 3089.0
221 Japan 1973 819.0
222 Third World 1973 927.0
223 China 1973 369.0
224 India 1973 194.0
225 Brazil 1973 102.0
226 Mexico 1973 47.0
227 World 1973 9359.0
228 Developed Countries 1980 9718.0
229 Europe 1980 2529.0
230 Belgium 1980 76.0
231 France 1980 362.0
232 Germany 1980 590.0
233 Italy 1980 319.0
234 Spain 1980 156.0
235 Sweden 1980 83.0
236 Switzerland 1980 54.0
237 United Kingdom 1980 441.0
238 Canada 1980 220.0
239 United States 1980 3475.0
240 Japan 1980 1001.0
241 Third World 1980 1323.0
242 China 1980 553.0
243 India 1980 254.0
244 Brazil 1980 159.0
245 Mexico 1980 68.0
246 World 1980 11041.0
In [601]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')
In [602]:
fig
Out[602]:
In [603]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

indpotential['dev_level'] = indpotential.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-Dev.pdf', dpi=300, bbox_inches='tight')
In [604]:
fig
Out[604]:
In [605]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-NonDev.pdf', dpi=300, bbox_inches='tight')
In [606]:
fig
Out[606]:
In [607]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[
                 (indpotential.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (indpotential.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-UK-IND.pdf', dpi=300, bbox_inches='tight')
In [608]:
fig
Out[608]:

Persistence

Let's explore the persistence of economic development since 1950. To do so, let's get the Penn World Table and World Bank Data.

Penn World Table

Let's start by importing the data from the Penn World Tables

In [104]:
try:
    pwt_xls = pd.read_excel(pathout + 'pwt91.xlsx',encoding='utf-8')
    pwt = pd.read_stata(pathout + 'pwt91.dta')    
except:
    pwt_xls = pd.read_excel('https://www.rug.nl/ggdc/docs/pwt91.xlsx',sheet_name=1)
    pwt = pd.read_stata('https://www.rug.nl/ggdc/docs/pwt91.dta')
    pwt_xls.to_excel(pathout + 'pwt91.xlsx', index=False, encoding='utf-8')
    pwt.to_stata(pathout + 'pwt91.dta', write_index=False, version=117)
    
# Get labels of variables
pwt_labels = pd.io.stata.StataReader(pathout + 'pwt91.dta').variable_labels()

The excel file let's us know the defintion of the variables, while the Stata file has the data. For some reason the original Stata file does not seem to have labels!

In [105]:
pwt_labels
Out[105]:
{'countrycode': '',
 'country': '',
 'currency_unit': '',
 'year': '',
 'rgdpe': '',
 'rgdpo': '',
 'pop': '',
 'emp': '',
 'avh': '',
 'hc': '',
 'ccon': '',
 'cda': '',
 'cgdpe': '',
 'cgdpo': '',
 'cn': '',
 'ck': '',
 'ctfp': '',
 'cwtfp': '',
 'rgdpna': '',
 'rconna': '',
 'rdana': '',
 'rnna': '',
 'rkna': '',
 'rtfpna': '',
 'rwtfpna': '',
 'labsh': '',
 'irr': '',
 'delta': '',
 'xr': '',
 'pl_con': '',
 'pl_da': '',
 'pl_gdpo': '',
 'i_cig': '',
 'i_xm': '',
 'i_xr': '',
 'i_outlier': '',
 'i_irr': '',
 'cor_exp': '',
 'statcap': '',
 'csh_c': '',
 'csh_i': '',
 'csh_g': '',
 'csh_x': '',
 'csh_m': '',
 'csh_r': '',
 'pl_c': '',
 'pl_i': '',
 'pl_g': '',
 'pl_x': '',
 'pl_m': '',
 'pl_n': '',
 'pl_k': ''}
In [106]:
pwt_xls
Out[106]:
Variable name Variable definition
0 Identifier variables NaN
1 countrycode 3-letter ISO country code
2 country Country name
3 currency_unit Currency unit
4 year Year
... ... ...
62 pl_g Price level of government consumption, price ...
63 pl_x Price level of exports, price level of USA GDP...
64 pl_m Price level of imports, price level of USA GDP...
65 pl_n Price level of the capital stock, price level ...
66 pl_k Price level of the capital services, price lev...

67 rows × 2 columns

In [123]:
pwt
Out[123]:
countrycode country currency_unit year rgdpe rgdpo pop emp avh hc ... csh_x csh_m csh_r pl_c pl_i pl_g pl_x pl_m pl_n pl_k
0 ABW Aruba Aruban Guilder 1950 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 ABW Aruba Aruban Guilder 1951 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 ABW Aruba Aruban Guilder 1952 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 ABW Aruba Aruban Guilder 1953 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 ABW Aruba Aruban Guilder 1954 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12371 ZWE Zimbabwe US Dollar 2013 28086.937500 28329.810547 15.054506 7.914061 NaN 2.504635 ... 0.169638 -0.426188 0.090225 0.577488 0.582022 0.448409 0.723247 0.632360 0.383488 0.704313
12372 ZWE Zimbabwe US Dollar 2014 29217.554688 29355.759766 15.411675 8.222112 NaN 2.550258 ... 0.141791 -0.340442 0.051500 0.600760 0.557172 0.392895 0.724510 0.628352 0.349735 0.704991
12373 ZWE Zimbabwe US Dollar 2015 30091.923828 29150.750000 15.777451 8.530669 NaN 2.584653 ... 0.137558 -0.354298 -0.023353 0.622927 0.580814 0.343926 0.654940 0.564430 0.348472 0.713156
12374 ZWE Zimbabwe US Dollar 2016 30974.292969 29420.449219 16.150362 8.839398 NaN 2.616257 ... 0.141248 -0.310446 0.003050 0.640176 0.599462 0.337853 0.657060 0.550084 0.346553 0.718671
12375 ZWE Zimbabwe US Dollar 2017 32693.474609 30940.816406 16.529903 9.181251 NaN 2.648248 ... 0.141799 -0.299539 0.019133 0.647136 0.726222 0.340680 0.645338 0.539529 0.412392 0.755215

12376 rows × 52 columns

Computing $\log$ GDP per capita

Now, we can create new variables, transform and plot the data

To compute the $log$ of income per capita (GDPpc), the first thing we need is to know the name of the column that contains the GDPpc data in the dataframe. To do this, let's find among the variables those whic in their description have the word capita.

In [108]:
pwt_xls.columns
Out[108]:
Index(['Variable name', 'Variable definition'], dtype='object')

To be able to read the definitions better, let's tell pandas to show us more content.

In [124]:
pd.set_option("display.max_columns", 20)
pd.set_option('display.max_rows', 500)
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth', -1)
In [125]:
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).lower().find('capita')!=-1)]
Out[125]:
Variable name Variable definition
12 hc Human capital index, based on years of schooling and returns to education; see Human capital in PWT9.
19 cn Capital stock at current PPPs (in mil. 2011US$)
20 ck Capital services levels at current PPPs (USA=1)
28 rnna Capital stock at constant 2011 national prices (in mil. 2011US$)
29 rkna Capital services at constant 2011 national prices (2011=1)
34 delta Average depreciation rate of the capital stock
47 i_irr 0/1/2/3: the observation for irr is not an outlier (0), may be biased due to a low capital share (1), hit the lower bound of 1 percent (2), or is an outlier (3)
53 csh_i Share of gross capital formation at current PPPs
61 pl_i Price level of capital formation, price level of USA GDPo in 2011=1
65 pl_n Price level of the capital stock, price level of USA in 2011=1
66 pl_k Price level of the capital services, price level of USA=1

So, it seems the data does not contain that variable. But do not panic...we know how to compute it based on GDP and Population. Let's do it!

Identify the name of the variable for GDP

In [129]:
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).upper().find('GDP')!=-1)]
Out[129]:
Variable name Variable definition
7 rgdpe Expenditure-side real GDP at chained PPPs (in mil. 2011US$)
8 rgdpo Output-side real GDP at chained PPPs (in mil. 2011US$)
17 cgdpe Expenditure-side real GDP at current PPPs (in mil. 2011US$)
18 cgdpo Output-side real GDP at current PPPs (in mil. 2011US$)
25 rgdpna Real GDP at constant 2011 national prices (in mil. 2011US$)
32 labsh Share of labour compensation in GDP at current national prices
38 pl_con Price level of CCON (PPP/XR), price level of USA GDPo in 2011=1
39 pl_da Price level of CDA (PPP/XR), price level of USA GDPo in 2011=1
40 pl_gdpo Price level of CGDPo (PPP/XR), price level of USA GDPo in 2011=1
46 i_outlier 0/1: the observation on pl_gdpe or pl_gdpo is not an outlier (0) or an outlier (1)
57 csh_r Share of residual trade and GDP statistical discrepancy at current PPPs
60 pl_c Price level of household consumption, price level of USA GDPo in 2011=1
61 pl_i Price level of capital formation, price level of USA GDPo in 2011=1
62 pl_g Price level of government consumption, price level of USA GDPo in 2011=1
63 pl_x Price level of exports, price level of USA GDPo in 2011=1
64 pl_m Price level of imports, price level of USA GDPo in 2011=1

Identify the name of the variable for population

In [127]:
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).lower().find('population')!=-1)]
Out[127]:
Variable name Variable definition
9 pop Population (in millions)

Create a new variables/columns with real GDPpc for all the measures included in PWT

In [135]:
# Get columns with GDP measures
gdpcols = pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).upper().find('REAL GDP')!=-1), 'Variable name'].tolist()

# Generate GDPpc for each measure
for gdp in gdpcols:
    pwt[gdp + '_pc'] = pwt[gdp] / pwt['pop']

# GDPpc data
gdppccols = [col+'_pc' for col in gdpcols]
pwt[['countrycode', 'country', 'year'] + gdppccols]
Out[135]:
countrycode country year rgdpe_pc rgdpo_pc cgdpe_pc cgdpo_pc rgdpna_pc
0 ABW Aruba 1950 NaN NaN NaN NaN NaN
1 ABW Aruba 1951 NaN NaN NaN NaN NaN
2 ABW Aruba 1952 NaN NaN NaN NaN NaN
3 ABW Aruba 1953 NaN NaN NaN NaN NaN
4 ABW Aruba 1954 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
12371 ZWE Zimbabwe 2013 1865.683105 1881.816040 1874.657715 1898.868286 1952.479736
12372 ZWE Zimbabwe 2014 1895.806519 1904.774048 1918.362305 1935.120605 1947.798950
12373 ZWE Zimbabwe 2015 1907.274170 1847.621094 1924.819824 1902.378662 1934.789307
12374 ZWE Zimbabwe 2016 1917.869873 1821.658813 1932.771973 1889.612061 1901.752686
12375 ZWE Zimbabwe 2017 1977.838257 1871.808716 1998.100098 1940.005371 1913.949829

12376 rows × 8 columns

Now let's use the apply function to compute logs.

In [142]:
pwt[['l'+col for col in gdppccols]] = pwt[gdppccols].apply(np.log, axis=1)
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]]
Out[142]:
countrycode country year lrgdpe_pc lrgdpo_pc lcgdpe_pc lcgdpo_pc lrgdpna_pc
0 ABW Aruba 1950 NaN NaN NaN NaN NaN
1 ABW Aruba 1951 NaN NaN NaN NaN NaN
2 ABW Aruba 1952 NaN NaN NaN NaN NaN
3 ABW Aruba 1953 NaN NaN NaN NaN NaN
4 ABW Aruba 1954 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
12371 ZWE Zimbabwe 2013 7.531383 7.539993 7.536181 7.549013 7.576856
12372 ZWE Zimbabwe 2014 7.547400 7.552119 7.559227 7.567925 7.574455
12373 ZWE Zimbabwe 2015 7.553431 7.521654 7.562588 7.550860 7.567754
12374 ZWE Zimbabwe 2016 7.558970 7.507503 7.566710 7.544127 7.550531
12375 ZWE Zimbabwe 2017 7.589760 7.534660 7.599952 7.570446 7.556924

12376 rows × 8 columns

How correlated are these measures of log GDP per capita?

In [146]:
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]].groupby('year').corr()
Out[146]:
lrgdpe_pc lrgdpo_pc lcgdpe_pc lcgdpo_pc lrgdpna_pc
year
1950 lrgdpe_pc 1.000000 0.996004 0.999360 0.994707 0.939644
lrgdpo_pc 0.996004 1.000000 0.995951 0.998978 0.942147
lcgdpe_pc 0.999360 0.995951 1.000000 0.995946 0.939410
lcgdpo_pc 0.994707 0.998978 0.995946 1.000000 0.943629
lrgdpna_pc 0.939644 0.942147 0.939410 0.943629 1.000000
1951 lrgdpe_pc 1.000000 0.996032 0.999418 0.994022 0.936705
lrgdpo_pc 0.996032 1.000000 0.995831 0.998796 0.938200
lcgdpe_pc 0.999418 0.995831 1.000000 0.995050 0.936603
lcgdpo_pc 0.994022 0.998796 0.995050 1.000000 0.939291
lrgdpna_pc 0.936705 0.938200 0.936603 0.939291 1.000000
1952 lrgdpe_pc 1.000000 0.996498 0.999486 0.994920 0.936988
lrgdpo_pc 0.996498 1.000000 0.996179 0.999023 0.938104
lcgdpe_pc 0.999486 0.996179 1.000000 0.995647 0.936417
lcgdpo_pc 0.994920 0.999023 0.995647 1.000000 0.938729
lrgdpna_pc 0.936988 0.938104 0.936417 0.938729 1.000000
1953 lrgdpe_pc 1.000000 0.997466 0.999431 0.996429 0.939742
lrgdpo_pc 0.997466 1.000000 0.997073 0.999098 0.940371
lcgdpe_pc 0.999431 0.997073 1.000000 0.997176 0.939380
lcgdpo_pc 0.996429 0.999098 0.997176 1.000000 0.941128
lrgdpna_pc 0.939742 0.940371 0.939380 0.941128 1.000000
1954 lrgdpe_pc 1.000000 0.992282 0.999454 0.977119 0.932511
lrgdpo_pc 0.992282 1.000000 0.991608 0.994367 0.930804
lcgdpe_pc 0.999454 0.991608 1.000000 0.977212 0.932554
lcgdpo_pc 0.977119 0.994367 0.977212 1.000000 0.916328
lrgdpna_pc 0.932511 0.930804 0.932554 0.916328 1.000000
1955 lrgdpe_pc 1.000000 0.986447 0.999441 0.974223 0.914494
lrgdpo_pc 0.986447 1.000000 0.986155 0.994564 0.918554
lcgdpe_pc 0.999441 0.986155 1.000000 0.974702 0.914119
lcgdpo_pc 0.974223 0.994564 0.974702 1.000000 0.904322
lrgdpna_pc 0.914494 0.918554 0.914119 0.904322 1.000000
1956 lrgdpe_pc 1.000000 0.989212 0.999443 0.979718 0.915022
lrgdpo_pc 0.989212 1.000000 0.988995 0.994867 0.921220
lcgdpe_pc 0.999443 0.988995 1.000000 0.980298 0.914682
lcgdpo_pc 0.979718 0.994867 0.980298 1.000000 0.909360
lrgdpna_pc 0.915022 0.921220 0.914682 0.909360 1.000000
1957 lrgdpe_pc 1.000000 0.991369 0.999443 0.980731 0.915115
lrgdpo_pc 0.991369 1.000000 0.991029 0.994925 0.921894
lcgdpe_pc 0.999443 0.991029 1.000000 0.981195 0.914591
lcgdpo_pc 0.980731 0.994925 0.981195 1.000000 0.908787
lrgdpna_pc 0.915115 0.921894 0.914591 0.908787 1.000000
1958 lrgdpe_pc 1.000000 0.993497 0.999455 0.988401 0.914426
lrgdpo_pc 0.993497 1.000000 0.993010 0.996842 0.924095
lcgdpe_pc 0.999455 0.993010 1.000000 0.988790 0.913860
lcgdpo_pc 0.988401 0.996842 0.988790 1.000000 0.915576
lrgdpna_pc 0.914426 0.924095 0.913860 0.915576 1.000000
1959 lrgdpe_pc 1.000000 0.992887 0.999453 0.989391 0.915675
lrgdpo_pc 0.992887 1.000000 0.992457 0.997297 0.923823
lcgdpe_pc 0.999453 0.992457 1.000000 0.989899 0.915662
lcgdpo_pc 0.989391 0.997297 0.989899 1.000000 0.917828
lrgdpna_pc 0.915675 0.923823 0.915662 0.917828 1.000000
1960 lrgdpe_pc 1.000000 0.984014 0.999226 0.969629 0.921319
lrgdpo_pc 0.984014 1.000000 0.983678 0.992461 0.922741
lcgdpe_pc 0.999226 0.983678 1.000000 0.970318 0.922220
lcgdpo_pc 0.969629 0.992461 0.970318 1.000000 0.905559
lrgdpna_pc 0.921319 0.922741 0.922220 0.905559 1.000000
1961 lrgdpe_pc 1.000000 0.984923 0.999332 0.975982 0.923561
lrgdpo_pc 0.984923 1.000000 0.984772 0.995101 0.926821
lcgdpe_pc 0.999332 0.984772 1.000000 0.976796 0.923956
lcgdpo_pc 0.975982 0.995101 0.976796 1.000000 0.913585
lrgdpna_pc 0.923561 0.926821 0.923956 0.913585 1.000000
1962 lrgdpe_pc 1.000000 0.990089 0.999354 0.990430 0.925668
lrgdpo_pc 0.990089 1.000000 0.989704 0.996867 0.931766
lcgdpe_pc 0.999354 0.989704 1.000000 0.991129 0.925904
lcgdpo_pc 0.990430 0.996867 0.991129 1.000000 0.925505
lrgdpna_pc 0.925668 0.931766 0.925904 0.925505 1.000000
1963 lrgdpe_pc 1.000000 0.956509 0.999375 0.955156 0.927586
lrgdpo_pc 0.956509 1.000000 0.955781 0.997625 0.905770
lcgdpe_pc 0.999375 0.955781 1.000000 0.955443 0.927904
lcgdpo_pc 0.955156 0.997625 0.955443 1.000000 0.899516
lrgdpna_pc 0.927586 0.905770 0.927904 0.899516 1.000000
1964 lrgdpe_pc 1.000000 0.986565 0.999388 0.989173 0.928045
lrgdpo_pc 0.986565 1.000000 0.986220 0.997690 0.933845
lcgdpe_pc 0.999388 0.986220 1.000000 0.989873 0.928456
lcgdpo_pc 0.989173 0.997690 0.989873 1.000000 0.930962
lrgdpna_pc 0.928045 0.933845 0.928456 0.930962 1.000000
1965 lrgdpe_pc 1.000000 0.991793 0.999426 0.995089 0.929228
lrgdpo_pc 0.991793 1.000000 0.991591 0.997405 0.938943
lcgdpe_pc 0.999426 0.991591 1.000000 0.995890 0.929622
lcgdpo_pc 0.995089 0.997405 0.995890 1.000000 0.935369
lrgdpna_pc 0.929228 0.938943 0.929622 0.935369 1.000000
1966 lrgdpe_pc 1.000000 0.992009 0.999478 0.995226 0.929149
lrgdpo_pc 0.992009 1.000000 0.991814 0.997637 0.939917
lcgdpe_pc 0.999478 0.991814 1.000000 0.995930 0.929628
lcgdpo_pc 0.995226 0.997637 0.995930 1.000000 0.936357
lrgdpna_pc 0.929149 0.939917 0.929628 0.936357 1.000000
1967 lrgdpe_pc 1.000000 0.992715 0.999520 0.996168 0.930821
lrgdpo_pc 0.992715 1.000000 0.992428 0.997633 0.941828
lcgdpe_pc 0.999520 0.992428 1.000000 0.996712 0.931097
lcgdpo_pc 0.996168 0.997633 0.996712 1.000000 0.937729
lrgdpna_pc 0.930821 0.941828 0.931097 0.937729 1.000000
1968 lrgdpe_pc 1.000000 0.992486 0.999540 0.996314 0.931315
lrgdpo_pc 0.992486 1.000000 0.992150 0.997702 0.942718
lcgdpe_pc 0.999540 0.992150 1.000000 0.996773 0.931588
lcgdpo_pc 0.996314 0.997702 0.996773 1.000000 0.938872
lrgdpna_pc 0.931315 0.942718 0.931588 0.938872 1.000000
1969 lrgdpe_pc 1.000000 0.990760 0.999559 0.995287 0.929009
lrgdpo_pc 0.990760 1.000000 0.990488 0.997613 0.942404
lcgdpe_pc 0.999559 0.990488 1.000000 0.995767 0.929356
lcgdpo_pc 0.995287 0.997613 0.995767 1.000000 0.938224
lrgdpna_pc 0.929009 0.942404 0.929356 0.938224 1.000000
1970 lrgdpe_pc 1.000000 0.953804 0.999510 0.988711 0.939976
lrgdpo_pc 0.953804 1.000000 0.952917 0.953341 0.911318
lcgdpe_pc 0.999510 0.952917 1.000000 0.989067 0.940254
lcgdpo_pc 0.988711 0.953341 0.989067 1.000000 0.944074
lrgdpna_pc 0.939976 0.911318 0.940254 0.944074 1.000000
1971 lrgdpe_pc 1.000000 0.948131 0.999504 0.992126 0.941310
lrgdpo_pc 0.948131 1.000000 0.947044 0.955796 0.903082
lcgdpe_pc 0.999504 0.947044 1.000000 0.992295 0.941618
lcgdpo_pc 0.992126 0.955796 0.992295 1.000000 0.944891
lrgdpna_pc 0.941310 0.903082 0.941618 0.944891 1.000000
1972 lrgdpe_pc 1.000000 0.952069 0.999511 0.992482 0.941521
lrgdpo_pc 0.952069 1.000000 0.951006 0.955554 0.907841
lcgdpe_pc 0.999511 0.951006 1.000000 0.992649 0.941877
lcgdpo_pc 0.992482 0.955554 0.992649 1.000000 0.946291
lrgdpna_pc 0.941521 0.907841 0.941877 0.946291 1.000000
1973 lrgdpe_pc 1.000000 0.953576 0.999506 0.992400 0.942611
lrgdpo_pc 0.953576 1.000000 0.952552 0.956208 0.910535
lcgdpe_pc 0.999506 0.952552 1.000000 0.992677 0.942997
lcgdpo_pc 0.992400 0.956208 0.992677 1.000000 0.949107
lrgdpna_pc 0.942611 0.910535 0.942997 0.949107 1.000000
1974 lrgdpe_pc 1.000000 0.958259 0.999463 0.993400 0.950298
lrgdpo_pc 0.958259 1.000000 0.957314 0.953643 0.912891
lcgdpe_pc 0.999463 0.957314 1.000000 0.994041 0.950522
lcgdpo_pc 0.993400 0.953643 0.994041 1.000000 0.953455
lrgdpna_pc 0.950298 0.912891 0.950522 0.953455 1.000000
1975 lrgdpe_pc 1.000000 0.958704 0.999382 0.994493 0.948101
lrgdpo_pc 0.958704 1.000000 0.957652 0.953390 0.910922
lcgdpe_pc 0.999382 0.957652 1.000000 0.995245 0.948434
lcgdpo_pc 0.994493 0.953390 0.995245 1.000000 0.951763
lrgdpna_pc 0.948101 0.910922 0.948434 0.951763 1.000000
1976 lrgdpe_pc 1.000000 0.957038 0.999430 0.994498 0.949717
lrgdpo_pc 0.957038 1.000000 0.955909 0.954837 0.911050
lcgdpe_pc 0.999430 0.955909 1.000000 0.995048 0.949718
lcgdpo_pc 0.994498 0.954837 0.995048 1.000000 0.953010
lrgdpna_pc 0.949717 0.911050 0.949718 0.953010 1.000000
1977 lrgdpe_pc 1.000000 0.958375 0.999466 0.994255 0.950237
lrgdpo_pc 0.958375 1.000000 0.957579 0.954960 0.912132
lcgdpe_pc 0.999466 0.957579 1.000000 0.995068 0.950062
lcgdpo_pc 0.994255 0.954960 0.995068 1.000000 0.952192
lrgdpna_pc 0.950237 0.912132 0.950062 0.952192 1.000000
1978 lrgdpe_pc 1.000000 0.959615 0.999460 0.994532 0.950197
lrgdpo_pc 0.959615 1.000000 0.958739 0.955213 0.912749
lcgdpe_pc 0.999460 0.958739 1.000000 0.995297 0.950110
lcgdpo_pc 0.994532 0.955213 0.995297 1.000000 0.952213
lrgdpna_pc 0.950197 0.912749 0.950110 0.952213 1.000000
1979 lrgdpe_pc 1.000000 0.961731 0.999489 0.994741 0.954619
lrgdpo_pc 0.961731 1.000000 0.960856 0.956228 0.915480
lcgdpe_pc 0.999489 0.960856 1.000000 0.995490 0.954477
lcgdpo_pc 0.994741 0.956228 0.995490 1.000000 0.954016
lrgdpna_pc 0.954619 0.915480 0.954477 0.954016 1.000000
1980 lrgdpe_pc 1.000000 0.965874 0.999521 0.993728 0.955749
lrgdpo_pc 0.965874 1.000000 0.965306 0.955175 0.919056
lcgdpe_pc 0.999521 0.965306 1.000000 0.994768 0.955565
lcgdpo_pc 0.993728 0.955175 0.994768 1.000000 0.953062
lrgdpna_pc 0.955749 0.919056 0.955565 0.953062 1.000000
1981 lrgdpe_pc 1.000000 0.966139 0.999528 0.994175 0.954581
lrgdpo_pc 0.966139 1.000000 0.965430 0.954363 0.915933
lcgdpe_pc 0.999528 0.965430 1.000000 0.995066 0.954489
lcgdpo_pc 0.994175 0.954363 0.995066 1.000000 0.950539
lrgdpna_pc 0.954581 0.915933 0.954489 0.950539 1.000000
1982 lrgdpe_pc 1.000000 0.967150 0.999511 0.994322 0.953473
lrgdpo_pc 0.967150 1.000000 0.966314 0.954329 0.915021
lcgdpe_pc 0.999511 0.966314 1.000000 0.995145 0.953435
lcgdpo_pc 0.994322 0.954329 0.995145 1.000000 0.950020
lrgdpna_pc 0.953473 0.915021 0.953435 0.950020 1.000000
1983 lrgdpe_pc 1.000000 0.967255 0.999527 0.995445 0.953260
lrgdpo_pc 0.967255 1.000000 0.966338 0.955252 0.914873
lcgdpe_pc 0.999527 0.966338 1.000000 0.996162 0.953157
lcgdpo_pc 0.995445 0.955252 0.996162 1.000000 0.951287
lrgdpna_pc 0.953260 0.914873 0.953157 0.951287 1.000000
1984 lrgdpe_pc 1.000000 0.966685 0.999540 0.995971 0.955695
lrgdpo_pc 0.966685 1.000000 0.965622 0.956083 0.915925
lcgdpe_pc 0.999540 0.965622 1.000000 0.996518 0.955465
lcgdpo_pc 0.995971 0.956083 0.996518 1.000000 0.953300
lrgdpna_pc 0.955695 0.915925 0.955465 0.953300 1.000000
1985 lrgdpe_pc 1.000000 0.969321 0.999526 0.993918 0.952957
lrgdpo_pc 0.969321 1.000000 0.968393 0.955792 0.915974
lcgdpe_pc 0.999526 0.968393 1.000000 0.994633 0.952802
lcgdpo_pc 0.993918 0.955792 0.994633 1.000000 0.949973
lrgdpna_pc 0.952957 0.915974 0.952802 0.949973 1.000000
1986 lrgdpe_pc 1.000000 0.967935 0.999566 0.996950 0.948449
lrgdpo_pc 0.967935 1.000000 0.966972 0.959844 0.910415
lcgdpe_pc 0.999566 0.966972 1.000000 0.997389 0.948772
lcgdpo_pc 0.996950 0.959844 0.997389 1.000000 0.949417
lrgdpna_pc 0.948449 0.910415 0.948772 0.949417 1.000000
1987 lrgdpe_pc 1.000000 0.964279 0.999597 0.997905 0.951867
lrgdpo_pc 0.964279 1.000000 0.963144 0.962962 0.911039
lcgdpe_pc 0.999597 0.963144 1.000000 0.998023 0.952306
lcgdpo_pc 0.997905 0.962962 0.998023 1.000000 0.954210
lrgdpna_pc 0.951867 0.911039 0.952306 0.954210 1.000000
1988 lrgdpe_pc 1.000000 0.968710 0.999617 0.996894 0.949698
lrgdpo_pc 0.968710 1.000000 0.967699 0.962665 0.912729
lcgdpe_pc 0.999617 0.967699 1.000000 0.997109 0.950227
lcgdpo_pc 0.996894 0.962665 0.997109 1.000000 0.950695
lrgdpna_pc 0.949698 0.912729 0.950227 0.950695 1.000000
1989 lrgdpe_pc 1.000000 0.977008 0.999662 0.995345 0.948742
lrgdpo_pc 0.977008 1.000000 0.976213 0.964434 0.919073
lcgdpe_pc 0.999662 0.976213 1.000000 0.995620 0.949361
lcgdpo_pc 0.995345 0.964434 0.995620 1.000000 0.946686
lrgdpna_pc 0.948742 0.919073 0.949361 0.946686 1.000000
1990 lrgdpe_pc 1.000000 0.973171 0.999567 0.996020 0.950278
lrgdpo_pc 0.973171 1.000000 0.972605 0.968616 0.921552
lcgdpe_pc 0.999567 0.972605 1.000000 0.996404 0.951025
lcgdpo_pc 0.996020 0.968616 0.996404 1.000000 0.952127
lrgdpna_pc 0.950278 0.921552 0.951025 0.952127 1.000000
1991 lrgdpe_pc 1.000000 0.972712 0.999573 0.996826 0.947826
lrgdpo_pc 0.972712 1.000000 0.972117 0.969926 0.918047
lcgdpe_pc 0.999573 0.972117 1.000000 0.997041 0.948603
lcgdpo_pc 0.996826 0.969926 0.997041 1.000000 0.945643
lrgdpna_pc 0.947826 0.918047 0.948603 0.945643 1.000000
1992 lrgdpe_pc 1.000000 0.975500 0.999604 0.995404 0.948009
lrgdpo_pc 0.975500 1.000000 0.974931 0.968785 0.920985
lcgdpe_pc 0.999604 0.974931 1.000000 0.995848 0.948623
lcgdpo_pc 0.995404 0.968785 0.995848 1.000000 0.950299
lrgdpna_pc 0.948009 0.920985 0.948623 0.950299 1.000000
1993 lrgdpe_pc 1.000000 0.976365 0.999592 0.996510 0.947115
lrgdpo_pc 0.976365 1.000000 0.975782 0.969220 0.920227
lcgdpe_pc 0.999592 0.975782 1.000000 0.996890 0.948043
lcgdpo_pc 0.996510 0.969220 0.996890 1.000000 0.947414
lrgdpna_pc 0.947115 0.920227 0.948043 0.947414 1.000000
1994 lrgdpe_pc 1.000000 0.984722 0.999647 0.993599 0.948043
lrgdpo_pc 0.984722 1.000000 0.984318 0.971116 0.927974
lcgdpe_pc 0.999647 0.984318 1.000000 0.993996 0.948706
lcgdpo_pc 0.993599 0.971116 0.993996 1.000000 0.943670
lrgdpna_pc 0.948043 0.927974 0.948706 0.943670 1.000000
1995 lrgdpe_pc 1.000000 0.978602 0.999649 0.997035 0.948032
lrgdpo_pc 0.978602 1.000000 0.978231 0.973086 0.923555
lcgdpe_pc 0.999649 0.978231 1.000000 0.997333 0.948558
lcgdpo_pc 0.997035 0.973086 0.997333 1.000000 0.947735
lrgdpna_pc 0.948032 0.923555 0.948558 0.947735 1.000000
1996 lrgdpe_pc 1.000000 0.981669 0.999605 0.996421 0.949408
lrgdpo_pc 0.981669 1.000000 0.981171 0.973080 0.928391
lcgdpe_pc 0.999605 0.981171 1.000000 0.996588 0.949658
lcgdpo_pc 0.996421 0.973080 0.996588 1.000000 0.948365
lrgdpna_pc 0.949408 0.928391 0.949658 0.948365 1.000000
1997 lrgdpe_pc 1.000000 0.979698 0.999687 0.996474 0.951719
lrgdpo_pc 0.979698 1.000000 0.979320 0.973031 0.929416
lcgdpe_pc 0.999687 0.979320 1.000000 0.996637 0.952101
lcgdpo_pc 0.996474 0.973031 0.996637 1.000000 0.951636
lrgdpna_pc 0.951719 0.929416 0.952101 0.951636 1.000000
1998 lrgdpe_pc 1.000000 0.992275 0.999735 0.976788 0.950113
lrgdpo_pc 0.992275 1.000000 0.992048 0.974894 0.939415
lcgdpe_pc 0.999735 0.992048 1.000000 0.977036 0.950714
lcgdpo_pc 0.976788 0.974894 0.977036 1.000000 0.932352
lrgdpna_pc 0.950113 0.939415 0.950714 0.932352 1.000000
1999 lrgdpe_pc 1.000000 0.992615 0.999767 0.996493 0.959380
lrgdpo_pc 0.992615 1.000000 0.992421 0.989367 0.950709
lcgdpe_pc 0.999767 0.992421 1.000000 0.996696 0.960000
lcgdpo_pc 0.996493 0.989367 0.996696 1.000000 0.963886
lrgdpna_pc 0.959380 0.950709 0.960000 0.963886 1.000000
2000 lrgdpe_pc 1.000000 0.987545 0.999801 0.997036 0.968434
lrgdpo_pc 0.987545 1.000000 0.987358 0.988148 0.954910
lcgdpe_pc 0.999801 0.987358 1.000000 0.997250 0.968962
lcgdpo_pc 0.997036 0.988148 0.997250 1.000000 0.973455
lrgdpna_pc 0.968434 0.954910 0.968962 0.973455 1.000000
2001 lrgdpe_pc 1.000000 0.989175 0.999817 0.995943 0.970442
lrgdpo_pc 0.989175 1.000000 0.989097 0.987931 0.958022
lcgdpe_pc 0.999817 0.989097 1.000000 0.996241 0.971040
lcgdpo_pc 0.995943 0.987931 0.996241 1.000000 0.973828
lrgdpna_pc 0.970442 0.958022 0.971040 0.973828 1.000000
2002 lrgdpe_pc 1.000000 0.983819 0.999827 0.996273 0.971843
lrgdpo_pc 0.983819 1.000000 0.983707 0.978909 0.953462
lcgdpe_pc 0.999827 0.983707 1.000000 0.996571 0.972518
lcgdpo_pc 0.996273 0.978909 0.996571 1.000000 0.977277
lrgdpna_pc 0.971843 0.953462 0.972518 0.977277 1.000000
2003 lrgdpe_pc 1.000000 0.978632 0.999868 0.974932 0.976359
lrgdpo_pc 0.978632 1.000000 0.978396 0.999426 0.971835
lcgdpe_pc 0.999868 0.978396 1.000000 0.975027 0.976941
lcgdpo_pc 0.974932 0.999426 0.975027 1.000000 0.970719
lrgdpna_pc 0.976359 0.971835 0.976941 0.970719 1.000000
2004 lrgdpe_pc 1.000000 0.996461 0.999892 0.984212 0.980907
lrgdpo_pc 0.996461 1.000000 0.996375 0.988905 0.982339
lcgdpe_pc 0.999892 0.996375 1.000000 0.984481 0.981342
lcgdpo_pc 0.984212 0.988905 0.984481 1.000000 0.971810
lrgdpna_pc 0.980907 0.982339 0.981342 0.971810 1.000000
2005 lrgdpe_pc 1.000000 0.997340 0.999927 0.996069 0.984843
lrgdpo_pc 0.997340 1.000000 0.997305 0.999633 0.987416
lcgdpe_pc 0.999927 0.997305 1.000000 0.996212 0.985014
lcgdpo_pc 0.996069 0.999633 0.996212 1.000000 0.988369
lrgdpna_pc 0.984843 0.987416 0.985014 0.988369 1.000000
2006 lrgdpe_pc 1.000000 0.994939 0.999944 0.994302 0.987993
lrgdpo_pc 0.994939 1.000000 0.994873 0.999840 0.990800
lcgdpe_pc 0.999944 0.994873 1.000000 0.994366 0.988081
lcgdpo_pc 0.994302 0.999840 0.994366 1.000000 0.990627
lrgdpna_pc 0.987993 0.990800 0.988081 0.990627 1.000000
2007 lrgdpe_pc 1.000000 0.995287 0.999965 0.995595 0.989327
lrgdpo_pc 0.995287 1.000000 0.995249 0.999902 0.993414
lcgdpe_pc 0.999965 0.995249 1.000000 0.995628 0.989402
lcgdpo_pc 0.995595 0.999902 0.995628 1.000000 0.993250
lrgdpna_pc 0.989327 0.993414 0.989402 0.993250 1.000000
2008 lrgdpe_pc 1.000000 0.996801 0.999976 0.996696 0.991798
lrgdpo_pc 0.996801 1.000000 0.996789 0.999962 0.995522
lcgdpe_pc 0.999976 0.996789 1.000000 0.996731 0.991815
lcgdpo_pc 0.996696 0.999962 0.996731 1.000000 0.995429
lrgdpna_pc 0.991798 0.995522 0.991815 0.995429 1.000000
2009 lrgdpe_pc 1.000000 0.996792 0.999989 0.997275 0.990339
lrgdpo_pc 0.996792 1.000000 0.996780 0.999918 0.995473
lcgdpe_pc 0.999989 0.996780 1.000000 0.997283 0.990441
lcgdpo_pc 0.997275 0.999918 0.997283 1.000000 0.995430
lrgdpna_pc 0.990339 0.995473 0.990441 0.995430 1.000000
2010 lrgdpe_pc 1.000000 0.995424 0.999997 0.995880 0.993624
lrgdpo_pc 0.995424 1.000000 0.995433 0.999959 0.998471
lcgdpe_pc 0.999997 0.995433 1.000000 0.995891 0.993655
lcgdpo_pc 0.995880 0.999959 0.995891 1.000000 0.998296
lrgdpna_pc 0.993624 0.998471 0.993655 0.998296 1.000000
2011 lrgdpe_pc 1.000000 0.995624 1.000000 0.995624 0.995624
lrgdpo_pc 0.995624 1.000000 0.995624 1.000000 1.000000
lcgdpe_pc 1.000000 0.995624 1.000000 0.995624 0.995624
lcgdpo_pc 0.995624 1.000000 0.995624 1.000000 1.000000
lrgdpna_pc 0.995624 1.000000 0.995624 1.000000 1.000000
2012 lrgdpe_pc 1.000000 0.992756 0.999996 0.992877 0.995411
lrgdpo_pc 0.992756 1.000000 0.992728 0.999980 0.996876
lcgdpe_pc 0.999996 0.992728 1.000000 0.992858 0.995391
lcgdpo_pc 0.992877 0.999980 0.992858 1.000000 0.996911
lrgdpna_pc 0.995411 0.996876 0.995391 0.996911 1.000000
2013 lrgdpe_pc 1.000000 0.995051 0.999983 0.995519 0.994929
lrgdpo_pc 0.995051 1.000000 0.995020 0.999933 0.998438
lcgdpe_pc 0.999983 0.995020 1.000000 0.995528 0.994857
lcgdpo_pc 0.995519 0.999933 0.995528 1.000000 0.998398
lrgdpna_pc 0.994929 0.998438 0.994857 0.998398 1.000000
2014 lrgdpe_pc 1.000000 0.982848 0.999965 0.983670 0.993826
lrgdpo_pc 0.982848 1.000000 0.982648 0.999939 0.989035
lcgdpe_pc 0.999965 0.982648 1.000000 0.983543 0.993708
lcgdpo_pc 0.983670 0.999939 0.983543 1.000000 0.989520
lrgdpna_pc 0.993826 0.989035 0.993708 0.989520 1.000000
2015 lrgdpe_pc 1.000000 0.988797 0.999953 0.990396 0.990281
lrgdpo_pc 0.988797 1.000000 0.988647 0.999737 0.991025
lcgdpe_pc 0.999953 0.988647 1.000000 0.990317 0.990045
lcgdpo_pc 0.990396 0.999737 0.990317 1.000000 0.992101
lrgdpna_pc 0.990281 0.991025 0.990045 0.992101 1.000000
2016 lrgdpe_pc 1.000000 0.976871 0.999936 0.981575 0.989787
lrgdpo_pc 0.976871 1.000000 0.976590 0.999441 0.983379
lcgdpe_pc 0.999936 0.976590 1.000000 0.981411 0.989453
lcgdpo_pc 0.981575 0.999441 0.981411 1.000000 0.986274
lrgdpna_pc 0.989787 0.983379 0.989453 0.986274 1.000000
2017 lrgdpe_pc 1.000000 0.975165 0.999933 0.978183 0.990955
lrgdpo_pc 0.975165 1.000000 0.974924 0.999628 0.982313
lcgdpe_pc 0.999933 0.974924 1.000000 0.978034 0.990629
lcgdpo_pc 0.978183 0.999628 0.978034 1.000000 0.984534
lrgdpna_pc 0.990955 0.982313 0.990629 0.984534 1.000000

While it seems they are highly correlated, it is hard to see here directly. Let's get the statistics for each measures correlations across all years.

In [145]:
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]].groupby('year').corr().describe()
Out[145]:
lrgdpe_pc lrgdpo_pc lcgdpe_pc lcgdpo_pc lrgdpna_pc
count 340.000000 340.000000 340.000000 340.000000 340.000000
mean 0.984764 0.976230 0.984787 0.983319 0.959519
std 0.021766 0.026951 0.021799 0.022352 0.031102
min 0.914426 0.903082 0.913860 0.899516 0.899516
25% 0.976850 0.958346 0.976905 0.973735 0.936402
50% 0.995482 0.988851 0.995799 0.995067 0.951741
75% 0.999805 0.998989 0.999805 0.998989 0.994875
max 1.000000 1.000000 1.000000 1.000000 1.000000

Ok. This gives us a better sense of how strongly correlated these measures of log GDP per capita are. In what follows we will use only one, namely Log[GDPpc] based on Expenditure-side real GDP at chained PPPs (in mil. 2011US$), i.e., lrgdpe_pc.

Convergence post-1960?

Let's start by looking at the distribution of Log[GDPpc] in 1960. For these we need to subset our dataframe and select only the rows for the year 1960. This is don with the loc property of the dataframe.

In [150]:
gdppc1960 = pwt.loc[pwt.year==1960, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
gdppc1960
Out[150]:
countrycode country year lrgdpe_pc
10 ABW Aruba 1960 NaN
78 AGO Angola 1960 NaN
146 AIA Anguilla 1960 NaN
214 ALB Albania 1960 NaN
282 ARE United Arab Emirates 1960 NaN
350 ARG Argentina 1960 7.979403
418 ARM Armenia 1960 NaN
486 ATG Antigua and Barbuda 1960 NaN
554 AUS Australia 1960 9.568694
622 AUT Austria 1960 9.136169
690 AZE Azerbaijan 1960 NaN
758 BDI Burundi 1960 6.509274
826 BEL Belgium 1960 9.255895
894 BEN Benin 1960 7.355827
962 BFA Burkina Faso 1960 6.514636
1030 BGD Bangladesh 1960 7.291890
1098 BGR Bulgaria 1960 NaN
1166 BHR Bahrain 1960 NaN
1234 BHS Bahamas 1960 NaN
1302 BIH Bosnia and Herzegovina 1960 NaN
1370 BLR Belarus 1960 NaN
1438 BLZ Belize 1960 NaN
1506 BMU Bermuda 1960 NaN
1574 BOL Bolivia (Plurinational State of) 1960 7.266469
1642 BRA Brazil 1960 7.708365
1710 BRB Barbados 1960 9.064743
1778 BRN Brunei Darussalam 1960 NaN
1846 BTN Bhutan 1960 NaN
1914 BWA Botswana 1960 6.051669
1982 CAF Central African Republic 1960 7.207524
2050 CAN Canada 1960 9.514649
2118 CHE Switzerland 1960 9.913402
2186 CHL Chile 1960 8.539205
2254 CHN China 1960 6.933300
2322 CIV Côte d'Ivoire 1960 7.522175
2390 CMR Cameroon 1960 7.270099
2458 COD D.R. of the Congo 1960 7.851367
2526 COG Congo 1960 6.866796
2594 COL Colombia 1960 8.139690
2662 COM Comoros 1960 7.404435
2730 CPV Cabo Verde 1960 6.919717
2798 CRI Costa Rica 1960 8.447079
2866 CUW Curaçao 1960 NaN
2934 CYM Cayman Islands 1960 NaN
3002 CYP Cyprus 1960 8.228935
3070 CZE Czech Republic 1960 NaN
3138 DEU Germany 1960 9.235772
3206 DJI Djibouti 1960 NaN
3274 DMA Dominica 1960 NaN
3342 DNK Denmark 1960 9.449154
3410 DOM Dominican Republic 1960 7.761885
3478 DZA Algeria 1960 8.547341
3546 ECU Ecuador 1960 8.284314
3614 EGY Egypt 1960 6.507276
3682 ESP Spain 1960 8.644430
3750 EST Estonia 1960 NaN
3818 ETH Ethiopia 1960 6.240098
3886 FIN Finland 1960 9.124974
3954 FJI Fiji 1960 7.901760
4022 FRA France 1960 9.251851
4090 GAB Gabon 1960 7.892188
4158 GBR United Kingdom 1960 9.378080
4226 GEO Georgia 1960 NaN
4294 GHA Ghana 1960 8.334852
4362 GIN Guinea 1960 7.956198
4430 GMB Gambia 1960 7.845465
4498 GNB Guinea-Bissau 1960 6.818258
4566 GNQ Equatorial Guinea 1960 7.096201
4634 GRC Greece 1960 8.490119
4702 GRD Grenada 1960 NaN
4770 GTM Guatemala 1960 7.755492
4838 HKG China, Hong Kong SAR 1960 8.242821
4906 HND Honduras 1960 7.623521
4974 HRV Croatia 1960 NaN
5042 HTI Haiti 1960 7.114264
5110 HUN Hungary 1960 NaN
5178 IDN Indonesia 1960 6.851088
5246 IND India 1960 6.935655
5314 IRL Ireland 1960 8.768335
5382 IRN Iran (Islamic Republic of) 1960 7.854090
5450 IRQ Iraq 1960 NaN
5518 ISL Iceland 1960 9.286066
5586 ISR Israel 1960 9.061831
5654 ITA Italy 1960 8.877171
5722 JAM Jamaica 1960 8.549061
5790 JOR Jordan 1960 7.934945
5858 JPN Japan 1960 8.578591
5926 KAZ Kazakhstan 1960 NaN
5994 KEN Kenya 1960 7.467464
6062 KGZ Kyrgyzstan 1960 NaN
6130 KHM Cambodia 1960 NaN
6198 KNA Saint Kitts and Nevis 1960 NaN
6266 KOR Republic of Korea 1960 7.015489
6334 KWT Kuwait 1960 NaN
6402 LAO Lao People's DR 1960 NaN
6470 LBN Lebanon 1960 NaN
6538 LBR Liberia 1960 NaN
6606 LCA Saint Lucia 1960 NaN
6674 LKA Sri Lanka 1960 7.927763
6742 LSO Lesotho 1960 6.748340
6810 LTU Lithuania 1960 NaN
6878 LUX Luxembourg 1960 9.856712
6946 LVA Latvia 1960 NaN
7014 MAC China, Macao SAR 1960 NaN
7082 MAR Morocco 1960 7.228742
7150 MDA Republic of Moldova 1960 NaN
7218 MDG Madagascar 1960 7.311968
7286 MDV Maldives 1960 NaN
7354 MEX Mexico 1960 8.681494
7422 MKD North Macedonia 1960 NaN
7490 MLI Mali 1960 6.388793
7558 MLT Malta 1960 7.659323
7626 MMR Myanmar 1960 NaN
7694 MNE Montenegro 1960 NaN
7762 MNG Mongolia 1960 NaN
7830 MOZ Mozambique 1960 6.265391
7898 MRT Mauritania 1960 7.021094
7966 MSR Montserrat 1960 NaN
8034 MUS Mauritius 1960 8.236774
8102 MWI Malawi 1960 6.764879
8170 MYS Malaysia 1960 7.894169
8238 NAM Namibia 1960 8.430022
8306 NER Niger 1960 7.205743
8374 NGA Nigeria 1960 8.285141
8442 NIC Nicaragua 1960 8.404028
8510 NLD Netherlands 1960 9.351271
8578 NOR Norway 1960 9.327065
8646 NPL Nepal 1960 6.545855
8714 NZL New Zealand 1960 9.517157
8782 OMN Oman 1960 NaN
8850 PAK Pakistan 1960 7.084360
8918 PAN Panama 1960 7.926917
8986 PER Peru 1960 7.873620
9054 PHL Philippines 1960 7.540202
9122 POL Poland 1960 NaN
9190 PRT Portugal 1960 8.329927
9258 PRY Paraguay 1960 7.361624
9326 PSE State of Palestine 1960 NaN
9394 QAT Qatar 1960 NaN
9462 ROU Romania 1960 7.250749
9530 RUS Russian Federation 1960 NaN
9598 RWA Rwanda 1960 6.873463
9666 SAU Saudi Arabia 1960 NaN
9734 SDN Sudan 1960 NaN
9802 SEN Senegal 1960 7.923772
9870 SGP Singapore 1960 7.879254
9938 SLE Sierra Leone 1960 NaN
10006 SLV El Salvador 1960 7.611158
10074 SRB Serbia 1960 NaN
10142 STP Sao Tome and Principe 1960 NaN
10210 SUR Suriname 1960 NaN
10278 SVK Slovakia 1960 NaN
10346 SVN Slovenia 1960 NaN
10414 SWE Sweden 1960 9.490262
10482 SWZ Eswatini 1960 NaN
10550 SXM Sint Maarten (Dutch part) 1960 NaN
10618 SYC Seychelles 1960 8.680614
10686 SYR Syrian Arab Republic 1960 7.588956
10754 TCA Turks and Caicos Islands 1960 NaN
10822 TCD Chad 1960 7.209953
10890 TGO Togo 1960 6.967492
10958 THA Thailand 1960 7.008891
11026 TJK Tajikistan 1960 NaN
11094 TKM Turkmenistan 1960 NaN
11162 TTO Trinidad and Tobago 1960 8.879218
11230 TUN Tunisia 1960 7.368283
11298 TUR Turkey 1960 8.452741
11366 TWN Taiwan 1960 7.781291
11434 TZA U.R. of Tanzania: Mainland 1960 7.119707
11502 UGA Uganda 1960 6.697016
11570 UKR Ukraine 1960 NaN
11638 URY Uruguay 1960 8.866385
11706 USA United States 1960 9.769903
11774 UZB Uzbekistan 1960 NaN
11842 VCT St. Vincent and the Grenadines 1960 NaN
11910 VEN Venezuela (Bolivarian Republic of) 1960 8.869224
11978 VGB British Virgin Islands 1960 NaN
12046 VNM Viet Nam 1960 NaN
12114 YEM Yemen 1960 NaN
12182 ZAF South Africa 1960 8.664413
12250 ZMB Zambia 1960 7.883263
12318 ZWE Zimbabwe 1960 7.646267

gdppc1960 has the data for all countries in th eyear 1960. We can plot the histogram using the functions of the dataframe.

In [155]:
gdppc1960.lrgdpe_pc.hist()
Out[155]:

We can also plot it using the seaborn package. Let's plot the kernel density of the distribution

In [168]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, shade=True, label='1960', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1960-density.pdf', dpi=300, bbox_inches='tight')
/Users/ozak/anaconda3/envs/GeoPython36env/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:447: RuntimeWarning: invalid value encountered in greater
  X = X[np.logical_and(X > clip[0], X < clip[1])] # won't work for two columns.
/Users/ozak/anaconda3/envs/GeoPython36env/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:447: RuntimeWarning: invalid value encountered in less
  X = X[np.logical_and(X > clip[0], X < clip[1])] # won't work for two columns.
In [164]:
fig
Out[164]:

Let's now also include the distribution for other years

In [169]:
gdppc1980 = pwt.loc[pwt.year==1980, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, shade=True, label='1960', linewidth=2)
sns.kdeplot(gdppc1980.lrgdpe_pc, ax=ax, shade=True, label='1980', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1960-1980-density.pdf', dpi=300, bbox_inches='tight')
/Users/ozak/anaconda3/envs/GeoPython36env/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:447: RuntimeWarning: invalid value encountered in greater
  X = X[np.logical_and(X > clip[0], X < clip[1])] # won't work for two columns.
/Users/ozak/anaconda3/envs/GeoPython36env/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:447: RuntimeWarning: invalid value encountered in less
  X = X[np.logical_and(X > clip[0], X < clip[1])] # won't work for two columns.
In [170]:
fig
Out[170]:
In [171]:
gdppc2000 = pwt.loc[pwt.year==2000, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, shade=True, label='1960', linewidth=2)
sns.kdeplot(gdppc1980.lrgdpe_pc, ax=ax, shade=True, label='1980', linewidth=2)
sns.kdeplot(gdppc2000.lrgdpe_pc, ax=ax, shade=True, label='2000', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1960-2000-density.pdf', dpi=300, bbox_inches='tight')
/Users/ozak/anaconda3/envs/GeoPython36env/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:447: RuntimeWarning: invalid value encountered in greater
  X = X[np.logical_and(X > clip[0], X < clip[1])] # won't work for two columns.
/Users/ozak/anaconda3/envs/GeoPython36env/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:447: RuntimeWarning: invalid value encountered in less
  X = X[np.logical_and(X > clip[0], X < clip[1])] # won't work for two columns.
In [172]:
fig
Out[172]:

Let's show the evolution of the distribution by looking at it every 10 years starting from 1950 onwards. Moreover, let's do everything in a unique piece of code.

In [188]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
period = list(range(1950, 2020, 10)) + [2017]
#mycolors = sns.color_palette("GnBu", n_colors=len(period)+5)
mycolors = sns.cubehelix_palette(len(period), start=.5, rot=-.75)
# Plot
fig, ax = plt.subplots()
k = 0
for t in period:
    sns.kdeplot(pwt.loc[pwt.year==t].lrgdpe_pc, ax=ax, shade=True, label=str(t), linewidth=2, c=mycolors[k])
    k += 1
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1950-2010-density.pdf', dpi=300, bbox_inches='tight')
/Users/ozak/anaconda3/envs/GeoPython36env/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:447: RuntimeWarning: invalid value encountered in greater
  X = X[np.logical_and(X > clip[0], X < clip[1])] # won't work for two columns.
/Users/ozak/anaconda3/envs/GeoPython36env/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:447: RuntimeWarning: invalid value encountered in less
  X = X[np.logical_and(X > clip[0], X < clip[1])] # won't work for two columns.
/Users/ozak/anaconda3/envs/GeoPython36env/lib/python3.6/site-packages/seaborn/distributions.py:323: MatplotlibDeprecationWarning: Saw kwargs ['c', 'color'] which are all aliases for 'color'.  Kept value from 'color'.  Passing multiple aliases for the same property will raise a TypeError in 3.3.
  ax.plot(x, y, color=color, label=label, **kwargs)
In [189]:
fig
Out[189]:

Persistence

The lack of convergence in the last 60 years suggest that there is some persistence in (recent) development. Let's explore this by plotting the association between past GDP per capita across different periods. In order to make things more comparable, let's normalize looking at income levels relative to the US. To do so, it's better to use the year as the index of the dataframe.

In [252]:
pwt.set_index('year', inplace=True)
pwt['lrgdpe_pc_US'] = pwt.loc[pwt.countrycode=='USA', 'lrgdpe_pc']
pwt['lrgdpe_pc_rel'] = pwt.lrgdpe_pc / pwt.lrgdpe_pc_US
pwt.reset_index(inplace=True)
pwt[['countrycode', 'country', 'year', 'lrgdpe_pc_rel']]
Out[252]:
countrycode country year lrgdpe_pc_rel
0 ABW Aruba 1950 NaN
1 ABW Aruba 1951 NaN
2 ABW Aruba 1952 NaN
3 ABW Aruba 1953 NaN
4 ABW Aruba 1954 NaN
... ... ... ... ...
12371 ZWE Zimbabwe 2013 0.693611
12372 ZWE Zimbabwe 2014 0.693790
12373 ZWE Zimbabwe 2015 0.692485
12374 ZWE Zimbabwe 2016 0.692220
12375 ZWE Zimbabwe 2017 0.694026

12376 rows × 4 columns

Let's plot the relative income levels in 1960 to 1980, 2000 and 2017. First let's create the wide version of this data.

In [263]:
relgdppc = pwt[['countrycode', 'year', 'lrgdpe_pc_rel']].pivot(index='countrycode', columns='year', values='lrgdpe_pc_rel')
relgdppc.columns = ['y' + str(col) for col in relgdppc.columns]
relgdppc.reset_index(inplace=True)
relgdppc
Out[263]:
countrycode y1950 y1951 y1952 y1953 y1954 y1955 y1956 y1957 y1958 ... y2008 y2009 y2010 y2011 y2012 y2013 y2014 y2015 y2016 y2017
0 ABW NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.989490 0.989011 0.977356 0.972664 0.969597 0.969952 0.968135 0.966769 0.963675 0.962826
1 AGO NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.786656 0.767352 0.794583 0.815654 0.815493 0.812440 0.805392 0.786142 0.782010 0.778917
2 AIA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.975120 0.951357 0.943233 0.942697 0.935110 0.931704 0.934498 0.934647 0.928544 0.918428
3 ALB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.826849 0.836372 0.844121 0.847105 0.848709 0.845853 0.848582 0.851238 0.850862 0.856015
4 ARE NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.041859 1.020390 1.013018 1.022369 1.022926 1.024411 1.027438 1.016130 1.014197 1.014201
5 ARG 0.821826 0.820079 0.805977 0.805312 0.809905 0.812385 0.812673 0.815312 0.823219 ... 0.892059 0.887703 0.895089 0.901183 0.898224 0.897675 0.892093 0.891258 0.888385 0.889500
6 ARM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.826112 0.820321 0.827278 0.836538 0.837942 0.838315 0.838879 0.836421 0.836002 0.841809
7 ATG NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.920010 0.915617 0.907847 0.907048 0.907098 0.904721 0.891541 0.893323 0.902276 0.906799
8 AUS 0.992383 0.977070 0.969928 0.975808 0.982563 0.978148 0.977311 0.974658 0.981875 ... 0.981331 0.983655 0.986351 0.988014 0.987315 0.988131 0.986996 0.983612 0.985552 0.985924
9 AUT 0.894721 0.893750 0.896809 0.898472 0.907360 0.911884 0.919295 0.923989 0.930644 ... 0.984842 0.986009 0.986394 0.989584 0.991093 0.989872 0.988264 0.989393 0.991123 0.992133
10 AZE NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.865638 0.862286 0.879292 0.899942 0.896536 0.895124 0.891589 0.879511 0.876348 0.878552
11 BDI NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.596627 0.599348 0.603251 0.604358 0.606091 0.609301 0.609966 0.605010 0.605352 0.604612
12 BEL 0.938343 0.939830 0.939478 0.935986 0.942173 0.942207 0.945773 0.945492 0.946531 ... 0.977643 0.979160 0.982812 0.984207 0.984268 0.982180 0.980491 0.981101 0.982194 0.982795
13 BEN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.694326 0.695688 0.692200 0.690875 0.692269 0.694545 0.697620 0.696126 0.691125 0.693375
14 BFA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.658307 0.660050 0.664462 0.668471 0.671214 0.670723 0.672563 0.672185 0.674622 0.677230
15 BGD NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.696692 0.707044 0.715840 0.725394 0.727932 0.731126 0.733644 0.736106 0.740378 0.745037
16 BGR NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.882959 0.886850 0.891003 0.893874 0.893553 0.892743 0.894323 0.897233 0.902559 0.905722
17 BHR NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.986521 0.973513 0.979706 0.984613 0.978987 0.985273 0.982881 0.970954 0.966895 0.964363
18 BHS NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.963726 0.958568 0.957205 0.954273 0.954349 0.949773 0.947511 0.942740 0.940669 0.937825
19 BIH NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.839189 0.842474 0.843858 0.846667 0.846727 0.849438 0.848991 0.852431 0.856142 0.859792
20 BLR NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.888248 0.887729 0.893470 0.908070 0.905703 0.899518 0.896490 0.893113 0.888399 0.891426
21 BLZ NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.823950 0.824100 0.826376 0.827130 0.827783 0.825726 0.825148 0.825741 0.825603 0.828015
22 BMU NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.020722 1.020868 1.020488 1.018261 1.013704 1.012768 1.010910 1.011030 1.012740 1.014179
23 BOL 0.773728 0.780894 0.776158 0.755933 0.758773 0.761211 0.750732 0.745349 0.746151 ... 0.778781 0.781265 0.789240 0.796940 0.800176 0.802180 0.799913 0.794074 0.794556 0.798382
24 BRA 0.768846 0.766391 0.771167 0.768165 0.775958 0.775628 0.775693 0.781227 0.785688 ... 0.863076 0.868590 0.879980 0.888472 0.887159 0.887502 0.884818 0.878136 0.874273 0.873692
25 BRB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.914548 0.907886 0.900727 0.895369 0.890439 0.884899 0.880209 0.879332 0.878337 0.872575
26 BRN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.046930 1.023243 1.025840 1.042257 1.040066 1.031201 1.027272 1.004781 0.993333 0.997052
27 BTN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.801292 0.809750 0.811314 0.812488 0.808730 0.820635 0.815524 0.817367 0.822358 0.823951
28 BWA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.872592 0.866102 0.874888 0.881371 0.878271 0.886106 0.889328 0.885029 0.891685 0.886532
29 CAF NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.624067 0.623044 0.624603 0.630166 0.632011 0.589252 0.589411 0.597103 0.602836 0.602471
30 CAN 0.972769 0.969030 0.973049 0.972306 0.970729 0.971913 0.977645 0.975641 0.977093 ... 0.981444 0.977118 0.979261 0.981826 0.981692 0.982934 0.983605 0.978585 0.977407 0.978715
31 CHE 0.997868 0.997500 0.995775 0.996462 1.004213 1.004081 1.008410 1.009722 1.008804 ... 1.006785 1.007711 1.007334 1.011278 1.011717 1.010979 1.010721 1.011344 1.010043 1.010414
32 CHL NaN 0.872579 0.877925 0.874568 0.881074 0.873385 0.870305 0.875079 0.877864 ... 0.895782 0.899857 0.911068 0.918108 0.920518 0.922063 0.923194 0.920667 0.921863 0.922364
33 CHN NaN NaN 0.722989 0.731520 0.722837 0.727335 0.730767 0.731220 0.728843 ... 0.825483 0.834918 0.845907 0.853163 0.854808 0.858580 0.861441 0.861366 0.863892 0.866567
34 CIV NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.714629 0.722925 0.726757 0.725079 0.727962 0.732360 0.737138 0.742739 0.748049 0.748936
35 CMR NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.729108 0.730184 0.730367 0.732345 0.733191 0.734960 0.736212 0.735478 0.736312 0.736770
36 COD 0.789034 0.797435 0.799461 0.808269 0.811456 0.811749 0.812299 0.807378 0.808006 ... 0.589669 0.590840 0.598778 0.601935 0.610430 0.616539 0.618094 0.617758 0.610901 0.611106
37 COG NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.757451 0.746950 0.781787 0.796128 0.794774 0.778578 0.781629 0.754292 0.751128 0.770845
38 COL 0.840497 0.833150 0.834817 0.838358 0.847787 0.842493 0.841054 0.836814 0.834562 ... 0.846773 0.850654 0.855923 0.864475 0.865770 0.867370 0.867543 0.864914 0.864813 0.865956
39 COM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.717034 0.720608 0.715774 0.721674 0.724280 0.728789 0.729745 0.728174 0.727176 0.726858
40 CPV NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.793276 0.797971 0.799167 0.803328 0.802403 0.802221 0.800012 0.799605 0.800980 0.804988
41 CRI 0.842101 0.839944 0.850906 0.864637 0.868607 0.866711 0.859493 0.862927 0.872064 ... 0.866234 0.873884 0.875044 0.877380 0.879234 0.879415 0.880409 0.882526 0.885172 0.886736
42 CUW NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.950154 0.949456 0.945779 0.943836 0.940764 0.937672 0.934520 0.931063 0.928522 0.924971
43 CYM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.019213 1.012699 1.006567 1.005275 1.003192 1.001566 1.000854 0.999521 1.000373 0.999812
44 CYP 0.825718 0.825829 0.835757 0.844800 0.858132 0.854021 0.864445 0.859262 0.847587 ... 0.969011 0.968440 0.965645 0.962122 0.956281 0.948474 0.944998 0.947666 0.952160 0.955377
45 CZE NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.948833 0.950300 0.948489 0.949990 0.948266 0.948626 0.950502 0.953424 0.955675 0.959502
46 DEU 0.882154 0.886635 0.894688 0.900197 0.910199 0.917643 0.924163 0.928908 0.934887 ... 0.978357 0.977311 0.980649 0.984666 0.983802 0.983427 0.983898 0.984285 0.986127 0.987802
47 DJI NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.740108 0.725002 0.713312 0.719192 0.720585 0.726598 0.731688 0.731214 0.757381 0.747389
48 DMA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.849542 0.851468 0.851440 0.851696 0.848754 0.846682 0.848562 0.844032 0.844858 0.833739
49 DNK 0.956605 0.946611 0.947823 0.951207 0.956385 0.949976 0.951380 0.954937 0.959487 ... 0.984482 0.983591 0.986675 0.987456 0.985368 0.984578 0.982841 0.983826 0.985183 0.988665
50 DOM NaN 0.798953 0.788876 0.780361 0.789301 0.780929 0.786620 0.796738 0.792253 ... 0.850751 0.856890 0.861435 0.861753 0.862088 0.863997 0.867687 0.872359 0.876274 0.877345
51 DZA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.874543 0.861850 0.871658 0.878085 0.877463 0.873085 0.868944 0.858761 0.857976 0.859770
52 ECU 0.836787 0.833156 0.837977 0.838874 0.849365 0.845880 0.845635 0.846080 0.847000 ... 0.838463 0.837822 0.841216 0.849340 0.851005 0.853045 0.853829 0.847261 0.843338 0.843973
53 EGY 0.672814 0.670662 0.658599 0.654224 0.659640 0.657818 0.658720 0.660437 0.661107 ... 0.818123 0.827465 0.837063 0.847521 0.846008 0.845862 0.844801 0.844407 0.845257 0.844111
54 ESP 0.850588 0.861282 0.863036 0.855552 0.869588 0.868909 0.875049 0.878579 0.884267 ... 0.965878 0.964946 0.961702 0.959633 0.956967 0.954875 0.954744 0.957303 0.959745 0.962427
55 EST NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.929537 0.923474 0.926638 0.936374 0.938834 0.939492 0.939169 0.939016 0.941128 0.945714
56 ETH 0.625201 0.634006 0.635001 0.637912 0.638299 0.634833 0.637652 0.634982 0.635359 ... 0.618220 0.627408 0.636359 0.644854 0.649787 0.656555 0.663061 0.667388 0.671274 0.670327
57 FIN 0.905330 0.917763 0.916142 0.907601 0.920705 0.923735 0.924111 0.922254 0.923569 ... 0.981857 0.978832 0.979236 0.982102 0.979201 0.976115 0.972548 0.972443 0.973820 0.975386
58 FJI NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.809885 0.813373 0.817131 0.823200 0.822376 0.825760 0.828566 0.829393 0.827936 0.828782
59 FRA 0.923894 0.922819 0.923578 0.924580 0.931753 0.930981 0.934737 0.939012 0.943488 ... 0.970412 0.970779 0.972160 0.973887 0.971635 0.971975 0.969762 0.969739 0.969994 0.971184
60 GAB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.884255 0.864740 0.876714 0.889896 0.888110 0.884127 0.874189 0.867116 0.864891 0.857876
61 GBR 0.951288 0.947420 0.946459 0.949101 0.954855 0.952985 0.954909 0.956145 0.959168 ... 0.972982 0.969767 0.971452 0.970734 0.970766 0.971721 0.971755 0.973084 0.973171 0.973743
62 GEO NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.809721 0.816492 0.821171 0.830960 0.835935 0.841100 0.843832 0.848752 0.852192 0.854897
63 GHA NaN NaN NaN NaN NaN 0.844072 0.841253 0.834688 0.844094 ... 0.760625 0.759148 0.764570 0.777262 0.779910 0.782735 0.784611 0.778521 0.778504 0.781599
64 GIN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.679983 0.676657 0.686349 0.686519 0.691516 0.694039 0.692701 0.691219 0.688824 0.701354
65 GMB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.720736 0.723535 0.724502 0.713134 0.714048 0.716462 0.711980 0.713187 0.709999 0.709893
66 GNB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.664856 0.663160 0.664909 0.674231 0.661932 0.661399 0.663091 0.671390 0.677159 0.683119
67 GNQ NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.950634 0.925045 0.935989 0.964934 0.966290 0.951642 0.942008 0.904826 0.884087 0.887223
68 GRC NaN 0.842899 0.837696 0.845969 0.848648 0.850391 0.858060 0.864338 0.871067 ... 0.955861 0.956000 0.946796 0.937038 0.930737 0.929435 0.928567 0.927226 0.926309 0.927369
69 GRD NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.858477 0.856843 0.856775 0.858493 0.855625 0.856275 0.860564 0.862850 0.864653 0.867331
70 GTM 0.806982 0.800549 0.797943 0.796264 0.798112 0.791359 0.796160 0.796913 0.798352 ... 0.802336 0.807444 0.808583 0.812355 0.811541 0.811535 0.812559 0.814770 0.815968 0.815584
71 HKG NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.994380 0.994094 0.996213 0.998732 0.997467 0.998677 0.998469 0.998574 0.998879 1.000002
72 HND 0.801858 0.797798 0.794723 0.796978 0.788245 0.781687 0.790889 0.787631 0.790154 ... 0.757062 0.759044 0.761052 0.764584 0.764323 0.761590 0.763393 0.766715 0.768131 0.769670
73 HRV NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.920535 0.919744 0.916748 0.918994 0.918303 0.916918 0.914701 0.917394 0.921033 0.923662
74 HTI NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.676782 0.682784 0.674102 0.676584 0.674716 0.677433 0.676991 0.678025 0.676392 0.672426
75 HUN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.922391 0.924999 0.927664 0.930766 0.928930 0.929973 0.930070 0.932322 0.931926 0.934821
76 IDN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.800243 0.811339 0.823982 0.839878 0.840811 0.843128 0.843945 0.844345 0.846387 0.849606
77 IND 0.702181 0.700865 0.700577 0.701794 0.706101 0.702366 0.704201 0.701786 0.708579 ... 0.749668 0.760344 0.771244 0.777388 0.780177 0.783953 0.786804 0.790684 0.795677 0.799698
78 IRL 0.890153 0.882560 0.888076 0.890582 0.894144 0.891403 0.889915 0.889701 0.892815 ... 0.991596 0.986747 0.987506 0.988670 0.990479 0.989520 0.993672 1.020108 1.025107 1.024363
79 IRN NaN NaN NaN NaN NaN 0.780675 0.784023 0.792398 0.802467 ... 0.895347 0.896951 0.902770 0.906959 0.892126 0.891304 0.888624 0.880549 0.883389 0.885114
80 IRQ NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.814686 0.832056 0.846350 0.869018 0.875402 0.891055 0.888232 0.882807 0.892599 0.889018
81 ISL 0.938588 0.926949 0.923085 0.934245 0.945011 0.952176 0.949190 0.947528 0.957632 ... 1.002560 0.989168 0.978915 0.978941 0.978430 0.978698 0.978717 0.983013 0.988297 0.991762
82 ISR 0.910803 0.915102 0.894779 0.888277 0.904024 0.906123 0.911502 0.913861 0.921327 ... 0.948999 0.953180 0.954596 0.956453 0.957835 0.960991 0.963207 0.963763 0.965971 0.966018
83 ITA 0.872794 0.875283 0.876086 0.880613 0.887068 0.888189 0.891254 0.895325 0.902244 ... 0.970207 0.970296 0.969640 0.970999 0.968283 0.965320 0.962766 0.963476 0.966825 0.967764
84 JAM NaN NaN NaN 0.839336 0.852501 0.854699 0.861270 0.874636 0.875993 ... 0.818012 0.820307 0.820550 0.822882 0.820657 0.818804 0.816889 0.817606 0.819082 0.818480
85 JOR NaN NaN NaN NaN 0.785721 0.758324 0.792702 0.790877 0.802057 ... 0.826507 0.840235 0.838909 0.842384 0.839344 0.835352 0.831846 0.832823 0.830927 0.829863
86 JPN 0.821019 0.826165 0.832060 0.832760 0.839845 0.842075 0.847394 0.852980 0.862923 ... 0.968363 0.967784 0.969599 0.968196 0.968700 0.969388 0.968461 0.968683 0.969353 0.969127
87 KAZ NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.893340 0.890903 0.905487 0.923823 0.922073 0.926190 0.925867 0.915855 0.914375 0.916688
88 KEN 0.768899 0.779579 0.766068 0.756999 0.769194 0.769329 0.768174 0.768888 0.768473 ... 0.706742 0.713946 0.718672 0.721985 0.724428 0.726013 0.725608 0.728278 0.731793 0.731740
89 KGZ NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.729085 0.734575 0.735769 0.753012 0.744731 0.749734 0.752178 0.752662 0.757132 0.759836
90 KHM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.719030 0.720806 0.721569 0.726724 0.731110 0.736357 0.736261 0.738574 0.743230 0.745832
91 KNA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.915302 0.917887 0.917186 0.921422 0.920520 0.923995 0.926457 0.924956 0.924795 0.924390
92 KOR NaN NaN NaN 0.715474 0.718426 0.720221 0.718617 0.726651 0.728503 ... 0.951319 0.954880 0.959193 0.959192 0.958016 0.958240 0.957880 0.959952 0.962081 0.963628
93 KWT NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.046829 1.018751 1.022714 1.038092 1.039998 1.031425 1.017135 0.981995 0.971963 0.965371
94 LAO NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.748565 0.762365 0.767085 0.779856 0.779683 0.787447 0.791443 0.793085 0.801821 0.803718
95 LBN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.883960 0.901916 0.901544 0.892879 0.890465 0.886736 0.881673 0.881507 0.882050 0.880319
96 LBR NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.609365 0.609619 0.608001 0.607710 0.613754 0.612879 0.617955 0.620224 0.616863 0.616722
97 LCA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.855505 0.859636 0.861222 0.866972 0.864810 0.861826 0.863381 0.859594 0.860997 0.864881
98 LKA 0.829122 0.829264 0.811901 0.811032 0.822648 0.823664 0.809627 0.807262 0.811445 ... 0.811708 0.826390 0.833379 0.840341 0.846209 0.849413 0.853065 0.857034 0.859668 0.862714
99 LSO NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.717665 0.713316 0.715903 0.723104 0.722556 0.724756 0.729728 0.734069 0.732670 0.732103
100 LTU NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.921116 0.912110 0.918561 0.927250 0.931976 0.935220 0.937369 0.938563 0.940567 0.945485
101 LUX 0.978073 0.994925 1.005660 0.987471 0.989761 0.990000 1.000349 1.001989 1.000761 ... 1.052345 1.047284 1.050350 1.056235 1.053553 1.051059 1.053563 1.053203 1.053184 1.052290
102 LVA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.915370 0.904828 0.906316 0.914725 0.916926 0.919595 0.921250 0.924139 0.926338 0.931824
103 MAC NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.021221 1.028977 1.049711 1.070128 1.075309 1.082032 1.077764 1.050814 1.045944 1.049962
104 MAR 0.748514 0.745479 0.747818 0.745886 0.751155 0.743026 0.741540 0.736771 0.743240 ... 0.792448 0.804277 0.806974 0.813973 0.814091 0.816790 0.817900 0.820308 0.820623 0.821836
105 MDA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.745076 0.750785 0.760414 0.772893 0.772335 0.778548 0.780789 0.780190 0.784953 0.787809
106 MDG NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.665474 0.663319 0.668637 0.673080 0.673831 0.672013 0.672213 0.670443 0.670881 0.674359
107 MDV NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.877127 0.868903 0.869359 0.876705 0.876760 0.883123 0.885460 0.884219 0.884345 0.887684
108 MEX 0.878292 0.878840 0.879064 0.870472 0.880443 0.880448 0.883704 0.887408 0.891104 ... 0.884842 0.883649 0.888029 0.893016 0.895041 0.895498 0.898892 0.896750 0.897227 0.897777
109 MKD NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.852737 0.860032 0.861764 0.862356 0.861285 0.864337 0.866155 0.868867 0.873002 0.873107
110 MLI NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.676276 0.689931 0.689539 0.694571 0.699731 0.698464 0.697242 0.705309 0.709451 0.712130
111 MLT NaN NaN NaN NaN 0.760923 0.757014 0.767266 0.770374 0.779890 ... 0.942079 0.943913 0.948256 0.948142 0.948526 0.950544 0.954221 0.961917 0.966661 0.973109
112 MMR NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.725540 0.736820 0.755500 0.763556 0.771034 0.779176 0.784092 0.783298 0.783633 0.790095
113 MNE NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.874333 0.875858 0.880293 0.882525 0.874768 0.877287 0.876890 0.880194 0.886482 0.890605
114 MNG NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.812841 0.809217 0.822383 0.837710 0.847950 0.853677 0.859433 0.856769 0.859407 0.858972
115 MOZ NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.637390 0.637529 0.633096 0.631257 0.634278 0.643113 0.649369 0.652992 0.653609 0.648914
116 MRT NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.729161 0.728707 0.742001 0.744348 0.736372 0.741391 0.733310 0.735376 0.731215 0.732068
117 MSR NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.904961 0.907656 0.902031 0.904568 0.906921 0.906916 0.907238 0.907074 0.906401 0.902111
118 MUS 0.885870 0.881621 0.880364 0.873783 0.871259 0.868305 0.870004 0.869793 0.864125 ... 0.888116 0.889695 0.889239 0.897254 0.896671 0.901559 0.903763 0.908411 0.912719 0.917005
119 MWI NaN NaN NaN NaN 0.683645 0.684251 0.689007 0.694683 0.695934 ... 0.651161 0.640319 0.635193 0.642898 0.634495 0.629728 0.636284 0.641200 0.635272 0.632007
120 MYS NaN NaN NaN NaN NaN 0.802702 0.799986 0.794300 0.789766 ... 0.913967 0.912787 0.916355 0.922676 0.923473 0.923812 0.926035 0.925673 0.926199 0.929592
121 NAM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.829674 0.832605 0.836021 0.840918 0.847016 0.849945 0.853864 0.855430 0.854593 0.852105
122 NER NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.610240 0.609348 0.612548 0.613852 0.620119 0.620720 0.622115 0.620889 0.621781 0.621275
123 NGA 0.821119 0.821600 0.828503 0.820892 0.830080 0.829149 0.822887 0.824551 0.833421 ... 0.776147 0.785307 0.791310 0.790924 0.796656 0.792156 0.790650 0.780300 0.769020 0.766613
124 NIC 0.847579 0.853261 0.861792 0.862358 0.867451 0.868499 0.864448 0.870081 0.870227 ... 0.761384 0.764138 0.765127 0.767676 0.770444 0.771679 0.775494 0.778516 0.782691 0.785194
125 NLD 0.933048 0.929696 0.936443 0.940446 0.944918 0.948507 0.950959 0.951560 0.954190 ... 0.994590 0.992399 0.991553 0.992550 0.990834 0.990262 0.985953 0.988003 0.987061 0.989431
126 NOR 0.945810 0.947046 0.947787 0.947053 0.951615 0.949691 0.953767 0.956422 0.956077 ... 1.017387 1.011585 1.013732 1.017278 1.019653 1.018860 1.014241 1.005697 1.001095 1.003074
127 NPL NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.675247 0.684213 0.693255 0.698804 0.702210 0.703066 0.706746 0.707937 0.708141 0.713608
128 NZL 0.980823 0.966463 0.959215 0.959746 0.969909 0.966121 0.964868 0.965712 0.965557 ... 0.954018 0.956321 0.956988 0.958466 0.958196 0.961884 0.963649 0.963334 0.965276 0.966633
129 OMN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.993545 0.977568 0.980246 0.988764 0.992848 0.986713 0.976033 0.952333 0.943865 0.937177
130 PAK 0.750571 0.743857 0.737029 0.735139 0.735825 0.726525 0.729578 0.727713 0.727192 ... 0.756503 0.764550 0.768118 0.774012 0.772691 0.773536 0.774782 0.774969 0.778169 0.780762
131 PAN 0.801787 0.790703 0.795948 0.797142 0.803734 0.804421 0.804443 0.811024 0.812316 ... 0.874387 0.883666 0.887055 0.896797 0.901053 0.905724 0.907221 0.910562 0.912871 0.916051
132 PER 0.797862 0.800084 0.797713 0.796330 0.804069 0.800832 0.801509 0.805373 0.801462 ... 0.834469 0.837789 0.846986 0.854276 0.856069 0.857164 0.856132 0.854562 0.856406 0.858164
133 PHL 0.755986 0.751658 0.752383 0.760828 0.766636 0.766731 0.767258 0.770051 0.772054 ... 0.782356 0.788165 0.795006 0.799125 0.801347 0.806132 0.808168 0.810313 0.814590 0.817467
134 POL NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.909608 0.916652 0.922715 0.927984 0.929680 0.929439 0.928977 0.932786 0.933993 0.937730
135 PRT 0.823873 0.828902 0.830442 0.833744 0.840486 0.838145 0.842234 0.844907 0.846971 ... 0.942969 0.944997 0.946357 0.941998 0.938120 0.939222 0.938646 0.940618 0.934007 0.932883
136 PRY NaN 0.766044 0.760054 0.751417 0.758220 0.759244 0.757840 0.754804 0.760532 ... 0.800251 0.804540 0.816164 0.823496 0.821762 0.828479 0.827750 0.824564 0.825467 0.825087
137 PSE NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.755302 0.762222 0.766128 0.774098 0.778010 0.772988 0.768651 0.768696 0.769505 0.767131
138 QAT NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.087721 1.067564 1.080146 1.103178 1.096468 1.089433 1.082030 1.055126 1.047842 1.052828
139 ROU NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.897751 0.900442 0.901510 0.904593 0.907758 0.909204 0.909446 0.913088 0.919540 0.926958
140 RUS NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.920427 0.913704 0.923056 0.934114 0.936232 0.935400 0.933698 0.924527 0.918864 0.920651
141 RWA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.650603 0.659563 0.663086 0.671237 0.674929 0.676602 0.680740 0.686577 0.689980 0.692659
142 SAU NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.983716 0.968196 0.982908 1.003649 1.003724 0.998523 0.992056 0.973992 0.967301 0.968732
143 SDN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.727641 0.742474 0.755000 0.770855 0.761832 0.758547 0.761796 0.764976 0.766659 0.765492
144 SEN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.723681 0.731640 0.731346 0.730002 0.728759 0.730673 0.731325 0.734115 0.735562 0.736296
145 SGP NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.017152 1.018395 1.030180 1.033458 1.033005 1.032616 1.031121 1.030679 1.030610 1.032185
146 SLE NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.642262 0.648480 0.648221 0.651709 0.663824 0.674398 0.662957 0.654880 0.661620 0.662711
147 SLV 0.784550 0.783648 0.785199 0.786881 0.791986 0.788184 0.789190 0.793821 0.788234 ... 0.800413 0.802586 0.804741 0.809071 0.809641 0.809498 0.809209 0.808953 0.810266 0.809530
148 SRB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.866963 0.870386 0.869537 0.874036 0.873295 0.875031 0.871492 0.873474 0.876276 0.877450
149 STP NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.720714 0.724872 0.728053 0.738867 0.738662 0.742628 0.741594 0.735790 0.747232 0.745787
150 SUR NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.864767 0.872171 0.879786 0.886340 0.886530 0.887194 0.884602 0.878951 0.872259 0.869469
151 SVK NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.934067 0.933734 0.939439 0.940094 0.940488 0.940295 0.939861 0.941005 0.942699 0.943987
152 SVN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.954524 0.949700 0.948700 0.949814 0.947380 0.946420 0.946009 0.947024 0.949080 0.953978
153 SWE 0.959013 0.959385 0.956189 0.955610 0.963300 0.960652 0.962981 0.964198 0.968789 ... 0.985118 0.982096 0.984474 0.987026 0.985869 0.983715 0.981699 0.984114 0.983615 0.985448
154 SWZ NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.817189 0.822072 0.821724 0.821067 0.822296 0.824398 0.822561 0.821989 0.819052 0.817937
155 SXM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.972944 0.975040 0.973026 0.969903 0.961685 0.958435 0.957267 0.961250 0.955064 0.945591
156 SYC NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.908114 0.901718 0.912151 0.921107 0.922659 0.939564 0.933999 0.938806 0.937110 0.938155
157 SYR NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.790021 0.792317 0.798099 0.797425 0.780933 0.764504 0.766567 0.765168 0.764292 0.766390
158 TCA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.942082 0.922271 0.920965 0.921774 0.915077 0.914632 0.917720 0.917201 0.918779 0.919810
159 TCD NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.681811 0.674660 0.696383 0.699931 0.698079 0.696679 0.676753 0.680567 0.669394 0.660316
160 TGO NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.649808 0.653946 0.656420 0.659513 0.661516 0.662832 0.665484 0.667920 0.670478 0.671573
161 THA 0.726703 0.719432 0.712702 0.700380 0.706645 0.694964 0.696141 0.704517 0.706005 ... 0.862282 0.868378 0.876420 0.878806 0.882541 0.883477 0.882296 0.884426 0.886116 0.888974
162 TJK NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.698403 0.704543 0.715019 0.720774 0.728454 0.733692 0.736847 0.733837 0.739360 0.746008
163 TKM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.859742 0.879565 0.885944 0.894308 0.898644 0.907156 0.912394 0.913424 0.916505 0.919224
164 TTO 0.866410 0.869756 0.868577 0.875669 0.881368 0.883176 0.894238 0.903699 0.910191 ... 0.952716 0.934552 0.945327 0.956691 0.955455 0.971621 0.978565 0.985092 0.975667 0.975614
165 TUN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.847772 0.854141 0.855586 0.852411 0.852803 0.853131 0.851320 0.850571 0.850574 0.849871
166 TUR 0.837983 0.851623 0.856978 0.863982 0.854459 0.856067 0.854907 0.873356 0.882210 ... 0.896330 0.895755 0.904928 0.912813 0.916209 0.919884 0.924316 0.929491 0.929106 0.931848
167 TWN NaN 0.767463 0.775383 0.779476 0.785136 0.784633 0.784853 0.788900 0.793906 ... 0.970471 0.973165 0.979428 0.980183 0.979526 0.981261 0.983818 0.985267 0.985322 0.985355
168 TZA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.686367 0.691908 0.697869 0.703703 0.703475 0.703654 0.709155 0.709853 0.712226 0.713733
169 UGA 0.705356 0.723098 0.711743 0.684258 0.693323 0.686379 0.684303 0.697462 0.692180 ... 0.671768 0.681307 0.684623 0.688753 0.687433 0.688185 0.688433 0.688913 0.685271 0.684645
170 UKR NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.846608 0.835407 0.836904 0.846481 0.849057 0.850634 0.847099 0.834491 0.833976 0.838499
171 URY 0.911823 0.926002 0.913065 0.912498 0.927041 0.922357 0.918988 0.921716 0.914166 ... 0.877229 0.886997 0.897333 0.905073 0.907070 0.910395 0.911326 0.908214 0.908352 0.908335
172 USA 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 ... 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
173 UZB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.794516 0.802370 0.809713 0.813207 0.818179 0.824285 0.827664 0.832459 0.835571 0.836773
174 VCT NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.846910 0.848190 0.845434 0.845194 0.841361 0.840666 0.842785 0.843363 0.843806 0.843122
175 VEN 0.905132 0.906186 0.912842 0.910820 0.919961 0.917171 0.919618 0.928464 0.926555 ... 0.886865 0.874588 0.889380 0.898629 0.897463 0.888132 0.869765 0.816462 0.816398 0.820951
176 VGB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.957352 0.954930 0.950693 0.947234 0.939215 0.940765 0.938551 0.934362 0.931747 0.929345
177 VNM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.755160 0.761839 0.771784 0.778270 0.783986 0.786128 0.789185 0.791125 0.796555 0.801899
178 YEM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.755317 0.757512 0.772611 0.761339 0.750812 0.750272 0.737346 0.711625 0.697790 0.684275
179 ZAF 0.893115 0.887755 0.875630 0.881956 0.889942 0.886195 0.889469 0.892274 0.891707 ... 0.862708 0.862971 0.864622 0.867436 0.866010 0.865265 0.863675 0.862188 0.860596 0.860410
180 ZMB NaN NaN NaN NaN NaN 0.813323 0.816613 0.796724 0.785549 ... 0.721515 0.732669 0.742396 0.750338 0.753036 0.753754 0.753928 0.751828 0.751907 0.757633
181 ZWE NaN NaN NaN NaN 0.769047 0.764846 0.770286 0.776759 0.775427 ... 0.618364 0.670103 0.674767 0.682105 0.690134 0.693611 0.693790 0.692485 0.692220 0.694026

182 rows × 69 columns

In [273]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
k = 0
fig, ax = plt.subplots()
ax.plot([relgdppc.y1960.min()*.99, relgdppc.y1960.max()*1.01], [relgdppc.y1960.min()*.99, relgdppc.y1960.max()*1.01], c='r', label='45 degree')
sns.regplot(x='y1960', y='y2017', data=relgdppc, ax=ax, label='1960-2017')
movex = relgdppc.y1960.mean() * 0.006125
movey = relgdppc.y2017.mean() * 0.006125
for line in range(0,relgdppc.shape[0]):
    if (np.isnan(relgdppc.y1960[line])==False) & (np.isnan(relgdppc.y2017[line])==False):
        ax.text(relgdppc.y1960[line]+movex, relgdppc.y2017[line]+movey, relgdppc.countrycode[line], horizontalalignment='left', fontsize=12, color='black', weight='semibold')
ax.set_xlabel('Log[Income per capita 1960] relative to US')
ax.set_ylabel('Log[Income per capita in 2017] relative to US')
ax.legend()
plt.savefig(pathgraphs + '1960_versus_2017_drop.pdf', dpi=300, bbox_inches='tight')
In [274]:
fig
Out[274]:

Let's create a function that will simplify our plotting of this figure for various years

In [286]:
def PersistencePlot(dfin, var0='y1960', var1='y2010', labelvar='countrycode', 
                    dx=0.006125, dy=0.006125, 
                    xlabel='Log[Income per capita 1960] relative to US', 
                    ylabel='Log[Income per capita in 2010] relative to US',
                    linelabel='1960-2010',
                    filename='1960_versus_2010_drop.pdf'):
    '''
    Plot the association between var0 and var in dataframe using labelvar for labels. 
    '''
    sns.set(rc={'figure.figsize':(11.7,8.27)})
    sns.set_context("talk")
    df = dfin.copy()
    df = df.dropna(subset=[var0, var1]).reset_index(drop=True)
    # Plot
    k = 0
    fig, ax = plt.subplots()
    ax.plot([df[var0].min()*.99, df[var0].max()*1.01], [df[var0].min()*.99, df[var0].max()*1.01], c='r', label='45 degree')
    sns.regplot(x=var0, y=var1, data=df, ax=ax, label=linelabel)
    movex = df[var0].mean() * dx
    movey = df[var1].mean() * dy
    for line in range(0,df.shape[0]):
        ax.text(df[var0][line]+movex, df[var1][line]+movey, df[labelvar][line], horizontalalignment='left', fontsize=12, color='black')
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    ax.legend()
    plt.savefig(pathgraphs + filename, dpi=300, bbox_inches='tight')
    pass
In [287]:
PersistencePlot(relgdppc, var0='y1980', var1='y2010', xlabel='Log[Income per capita 1980] relative to US',
                ylabel='Log[Income per capita in 2010] relative to US',
                    filename='1980_versus_2010_drop.pdf')
In [289]:
PersistencePlot(relgdppc.loc[(relgdppc.countrycode!='BRN')& (relgdppc.countrycode!='ARE')], var0='y1980', var1='y2010', xlabel='Log[Income per capita 1980] relative to US',
                ylabel='Log[Income per capita in 2010] relative to US', linelabel='1980-2010',
                filename='1980_versus_2010_drop.pdf')
In [241]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
period = list(range(1980, 2020, 20)) + [2017]
#mycolors = sns.color_palette("GnBu", n_colors=len(period)+5)
mycolors = sns.cubehelix_palette(len(period), start=.5, rot=-.75)
# Plot
k = 0
fig, ax = plt.subplots()
for t in period:
    sns.regplot(x='y1960', y='y'+str(t), data=relgdppc, ax=ax, label='1960-'+str(t))
    k += 1
ax.set_xlabel('Log[Income per capita 1960] relative to US')
ax.set_ylabel('Log[Income per capita in other period] relative to US')
ax.legend()
Out[241]:

Create a plot of this measure of GDP vs Population for all countries in the last year of the dataset

In [ ]:
# Let's figure out the last period
lastperiod = dfpwt.iloc[-1].year
print(lastperiod)
In [ ]:
# Select data for last period
dflast = dfpwt.loc[dfpwt.year==lastperiod]
In [ ]:
dflast
In [ ]:
ax = dflast.plot.scatter(x='pop', y='rgdpe', c='rgdppce', cmap='Reds', )

Use statistical and mathematical functions to analyze the data

In [ ]:
# Describe the data
dfpwt.describe()
In [ ]:
dflast.describe()
In [ ]:
dflast[['rgdpe', 'rgdppce', 'pop']].corr()

Excercise:

  1. Create GDPpc measures based on all other measures of GDP
  2. Compare these measures using plot, correlations, etc.
In [ ]:
dfincome=pd.read_csv('./WDI/wdigdppc.csv', skiprows=2)
dfmobile=pd.read_excel('./WDI/wdimobile.xls', skiprows=2)

Now you should have a data frame with the data you downloaded.

Let's see what they look like...

In [ ]:
dfincome
In [ ]:
dfmobile

Notice that these data frames look like spreadsheets or data tables. Columns have names that can be used to call the data, e.g.

In [ ]:
dfmobile.columns
In [ ]:
dfmobile['Country Code']

We can use columns to compute additional information. For example the growth rate of income per capite between 2000 and 2005 is

In [ ]:
growth=np.log(dfincome['2005'])-np.log(dfincome['2000'])
In [ ]:
growth

Notice we do not know which country the growth rate belongs to. We could create a column in dfincome that holds the growth value or we can change the index of the data frame so that it keeps the country code (this is useful!)

In [ ]:
dfincome['growth']=np.log(dfincome['2005'])-np.log(dfincome['2000'])
In [ ]:
dfincome[['Country Code','growth']]
In [ ]:
dfincome.set_index('Country Code', inplace=True)
In [ ]:
growth=np.log(dfincome['2005'])-np.log(dfincome['2000'])
growth.name='growth'
growth

Let's delete the growth column from dfincome

In [ ]:
dfincome.drop('growth',axis=1, inplace=True)

Let's compute the growth of cell phone subscription for the period 2000-2005

In [ ]:
dfmobile.set_index('Country Code',inplace=True)
growthmobile=np.log(dfmobile['2005'])-np.log(dfmobile['2000'])
In [ ]:
growthmobile.name='mobile'
In [ ]:
growthmobile

Let's see the descriptive stats for each growth process

In [ ]:
growth.describe()
In [ ]:
growthmobile.describe()

Notice that growthmobile has infinite mean, i.e. some country started with zero coverage and now has positive one. Let's see for which ccountries that is the case and change those observations to NaN.

In [ ]:
growthmobile.ix[growthmobile==np.inf]
In [ ]:
growthmobile.ix[growthmobile==np.inf]=np.nan
In [ ]:
growthmobile.describe()

Let's compute the correlation between between both growth rates

In [ ]:
growth.corr(growthmobile)

Let's run an OLS regression between both growth rates, but first let's merge the data together. First we merge the data using the pd.merge command, which allows us to merge two data frames.

In [ ]:
mydata=pd.merge(growth.reset_index(),growthmobile.reset_index())
mydata

Second, we use the pd.concat command that concatenates series or data frames into data frames.

In [ ]:
grates=pd.concat([growth,growthmobile],axis=1)
grates

Now let's import the statsmodels module to run the regression.

In [ ]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
from IPython.display import Latex
In [ ]:
mod = sm.OLS(mydata['growth'],sm.add_constant(mydata['mobile']), missing='drop').fit()
mod.summary2()
In [ ]:
mod = smf.ols(formula='growth ~ mobile', data=mydata[['growth','mobile']], missing='drop').fit()
mod.summary2()
In [ ]:
mod = smf.ols(formula='growth ~ mobile', data=grates, missing='drop').fit()
mod.summary2()
In [ ]:
mysummary=mod.summary2()
Latex(mysummary.as_latex())

Homework

Using Pandas and Statsmodels write a Python script that:

  1. Downloads and opens the data from the Penn World Tables (PWT) versions 7.1 and 8.0
  2. Using the data from the PWT estimate the contribution of technological progress, TFP, by using the growth accounting framework studied in class.
  3. Using the data from the PWT calibrate productivity difference using the framework studied in class.
  4. Using the data from the PWT replicate the MRW analysis.

Some additional useful tools:

  • LaTeX Output
  • Plotting
In [ ]:
%%latex
\begin{eqnarray}
\nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\
\nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\
\nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\
\nabla \cdot \vec{\mathbf{B}} & = 0 
\end{eqnarray}

Some examples of plots...

In [ ]:
dfincome[[str(i) for i in range(1990,2013)]].loc['USA'].plot()
In [ ]:
plt.scatter(grates.growth,grates.mobile)
In [ ]:
from pandas_datareader import data, wb
dfwbcountries = wb.get_countries()
dfwbcountries['name'] = dfwbcountries.name.str.strip()
popvars = wb.search(string='population')
popfields = ['SP.POP.0014.FE.IN', 'SP.POP.1564.MA.IN', 'SP.POP.65UP.FE.IN',
             'SP.POP.0014.MA.IN', 'SP.POP.1564.MA.IN', 'SP.POP.65UP.MA.IN',
             'SP.POP.TOTL.FE.IN', 'SP.POP.TOTL.MA.IN', 'SP.POP.TOTL',
             'EN.URB.MCTY', 'EN.URB.LCTY']
wdi = wb.download(indicator=popfields, country=dfwbcountries.iso2c.values, start=2017, end=2017)
wdi.reset_index(inplace=True)
wdi = dfwbcountries.merge(wdi, left_on='name', right_on='country')
wdi['ISO_CODE'] = wdi.iso2c.str.strip()
wdi.set_index('ISO_CODE', inplace=True)
wdi.to_csv('./WDI/wdipop.csv', encoding='utf-8')

Examples